Re: Select statements return null
Hello Sanita, If you use a JSON try to add the jar 'hive-json-serde.jar' before you upload your data in the final table. And also try to make your date attributes in String format first to debug (if this is the cause). I don't know if you are using an external table with regular expressions (regexp) to pasre your data?; if this is, can you send us the definition of table and the structure of a row from your data. the final way that I can suggest is to run an operation mapreduce over the table (select count (1) from your_table) and then see the log of jobtracker to debug the issue. hope this can help you ;) 2013/7/30 Sunita Arvind sunitarv...@gmail.com Hi, I have written a script which generates JSON files, loads it into a dictionary, adds a few attributes and uploads the modified files to HDFS. After the files are generated, if I perform a select * from..; on the table which points to this location, I get null, null as the result. I also tried without the added attributes and it did not make a difference. I strongly suspect the data. Currently I am using strip() to eliminate trailing and leading whitespaces and newlines. Wondering if embedded \n that is, json string objects containing \n in the value, causes such issues. There are no parsing errors, so I am not able to debug this issue. Are there any flags that I can set to figure out what is happening within the parser code? I set this: hive -hiveconf hive.root.logger=DEBUG,console But the output is not really useful: blocks=[LocatedBlock{BP-330966259-192.168.1.61-1351349834344:blk_-6076570611719758877_116734; getBlockSize()=20635; corrupt=false; offset=0; locs=[192.168.1.61:50010, 192.168.1.66:50010, 192.168.1.63:50010]}] lastLocatedBlock=LocatedBlock{BP-330966259-192.168.1.61-1351349834344:blk_-6076570611719758877_116734; getBlockSize()=20635; corrupt=false; offset=0; locs=[192.168.1.61:50010, 192.168.1.66:50010, 192.168.1.63:50010]} isLastBlockComplete=true} 13/07/30 11:49:41 DEBUG hdfs.DFSClient: Connecting to datanode 192.168.1.61:50010 null null null null null null null null null null null null null null null null 13/07/30 11:49:41 INFO exec. Also, the attributes I am adding are current year, month day and time. So they are not null for any record. I even moved existing files which did not have these fields set so that there are no records with these fields as null. However, I dont think this is an issue as the advantage of JSON/Hive JSON serde is that it allows object struct to be dynamic. Right? Any suggestion regarding debugging would be very helpful. thanks Sunita
Re: UDFs with package names
Yup, it was the directory structure com/mystuff/whateverUDF.class that was missing. Thought I had tried that before posting my question, but... Thanks for your help! From: Edward Capriolo edlinuxg...@gmail.com To: user@hive.apache.org user@hive.apache.org; Michael Malak michaelma...@yahoo.com Sent: Tuesday, July 30, 2013 7:06 PM Subject: Re: UDFs with package names It might be a better idea to use your own package com.mystuff.x. You might be running into an issue where java is not finding the file because it assumes the relation between package and jar is 1 to 1. You might also be compiling wrong If your package is com.mystuff that class file should be in a directory structure com/mystuff/whateverUDF.class I am not seeing that from your example. On Tue, Jul 30, 2013 at 8:00 PM, Michael Malak michaelma...@yahoo.com wrote: Thus far, I've been able to create Hive UDFs, but now I need to define them within a Java package name (as opposed to the default Java package as I had been doing), but once I do that, I'm no longer able to load them into Hive. First off, this works: add jar /usr/lib/hive/lib/hive-contrib-0.10.0-cdh4.3.0.jar; create temporary function row_sequence as 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence'; Then I took the source code for UDFRowSequence.java from http://svn.apache.org/repos/asf/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/udf/UDFRowSequence.java and renamed the file and the class inside to UDFRowSequence2.java I compile and deploy it with: javac -cp /usr/lib/hive/lib/hive-exec-0.10.0-cdh4.3.0.jar:/usr/lib/hadoop/hadoop-common.jar UDFRowSequence2.java jar cvf UDFRowSequence2.jar UDFRowSequence2.class sudo cp UDFRowSequence2.jar /usr/local/lib But in Hive, I get the following: hive add jar /usr/local/lib/UDFRowSequence2.jar; Added /usr/local/lib/UDFRowSequence2.jar to class path Added resource: /usr/local/lib/UDFRowSequence2.jar hive create temporary function row_sequence as 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence2'; FAILED: Class org.apache.hadoop.hive.contrib.udf.UDFRowSequence2 not found FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask But if I comment out the package line in UDFRowSequence2.java (to put the UDF into the default Java package), it works: hive add jar /usr/local/lib/UDFRowSequence2.jar; Added /usr/local/lib/UDFRowSequence2.jar to class path Added resource: /usr/local/lib/UDFRowSequence2.jar hive create temporary function row_sequence as 'UDFRowSequence2'; OK Time taken: 0.383 seconds What am I doing wrong? I have a feeling it's something simple.
Re: Review Request (wikidoc): LZO Compression in Hive
Hi guys Any chance I could get cwiki update privileges today ? Thanks sanjay From: Sanjay Subramanian sanjay.subraman...@wizecommerce.commailto:sanjay.subraman...@wizecommerce.com Date: Tuesday, July 30, 2013 4:26 PM To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Cc: d...@hive.apache.orgmailto:d...@hive.apache.org d...@hive.apache.orgmailto:d...@hive.apache.org Subject: Review Request (wikidoc): LZO Compression in Hive Hi Met with Lefty this afternoon and she was kind to spend time to add my documentation to the site - since I still don't have editing privileges :-) Please review the new wikidoc about LZO compression in the Hive language manual. If anything is unclear or needs more information, you can email suggestions to this list or edit the wiki yourself (if you have editing privileges). Here are the links: 1. Language Manualhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual (new bullet under File Formats) 2. LZO Compressionhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO 3. CREATE TABLEhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable (near end of section, pasted in here:) Use STORED AS TEXTFILE if the data needs to be stored as plain text files. Use STORED AS SEQUENCEFILE if the data needs to be compressed. Please read more about CompressedStoragehttps://cwiki.apache.org/confluence/display/Hive/CompressedStorage if you are planning to keep data compressed in your Hive tables. Use INPUTFORMAT and OUTPUTFORMAT to specify the name of a corresponding InputFormat and OutputFormat class as a string literal, e.g., 'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextInputFormat'. For LZO compression, the values to use are 'INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' (see LZO Compressionhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO). My cwiki id is https://cwiki.apache.org/confluence/display/~sanjaysubraman...@yahoo.com It will be great if I could get edit privileges Thanks sanjay CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors
Thanks Nitin There arent too many connections in close_wait state only 1 or two when we run into this. Most likely its because of dropped connection. I could not find any read or write timeouts we can set for the thrift server which will tell thrift to hold on to the client connection. See this https://issues.apache.org/jira/browse/HIVE-2006 but doesnt seem to have been implemented yet. We do have set a client connection timeout but cannot find an equivalent setting for the server. We have a suspicion that this happens when we run two client processes which modify two distinct partitions of the same hive table. We put in a workaround so that the two hive client processes never run together and so far things look ok but we will keep monitoring. Could it be because hive metastore server is not thread safe, would running two alter table statements on two distinct partitions of the same table using two client connections cause problems like these, where hive metastore server closes or drops a wrong client connection and leaves the other hanging? Agateaaa On Tue, Jul 30, 2013 at 12:49 AM, Nitin Pawar nitinpawar...@gmail.comwrote: The mentioned flow is called when you have unsecure mode of thrift metastore client-server connection. So one way to avoid this is have a secure way. code public boolean process(final TProtocol in, final TProtocol out) throwsTException { setIpAddress(in); ... ... ... @Override protected void setIpAddress(final TProtocol in) { TUGIContainingTransport ugiTrans = (TUGIContainingTransport)in.getTransport(); Socket socket = ugiTrans.getSocket(); if (socket != null) { setIpAddress(socket); /code From the above code snippet, it looks like the null pointer exception is not handled if the getSocket returns null. can you check whats the ulimit setting on the server? If its set to default can you set it to unlimited and restart hcat server. (This is just a wild guess). also the getSocket method suggests If the underlying TTransport is an instance of TSocket, it returns the Socket object which it contains. Otherwise it returns null. so someone from thirft gurus need to tell us whats happening. I have no knowledge of this depth may be Ashutosh or Thejas will be able to help on this. From the netstat close_wait, it looks like the hive metastore server has not closed the connection (do not know why yet), may be the hive dev guys can help.Are there too many connections in close_wait state? On Tue, Jul 30, 2013 at 5:52 AM, agateaaa agate...@gmail.com wrote: Looking at the hive metastore server logs see errors like these: 2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer (TThreadPoolServer.java:run(182)) - Error occurred during processing of message. java.lang.NullPointerException at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) approx same time as we see timeout or connection reset errors. Dont know if this is the cause or the side affect of he connection timeout/connection reset errors. Does anybody have any pointers or suggestions ? Thanks On Mon, Jul 29, 2013 at 11:29 AM, agateaaa agate...@gmail.com wrote: Thanks Nitin! We have simiar setup (identical hcatalog and hive server versions) on a another production environment and dont see any errors (its been running ok for a few months) Unfortunately we wont be able to move to hcat 0.5 and hive 0.11 or hive 0.10 soon. I did see that the last time we ran into this problem doing a netstat-ntp | grep :1 see that server was holding on to one socket connection in CLOSE_WAIT state for a long time (hive metastore server is running on port 1). Dont know if thats relevant here or not Can you suggest any hive configuration settings we can tweak or networking tools/tips, we can use to narrow this down ? Thanks Agateaaa On Mon, Jul 29, 2013 at 11:02 AM, Nitin Pawar nitinpawar...@gmail.com wrote: Is there any chance you can do a update on test environment with hcat-0.5 and hive-0(11 or 10) and see if you can reproduce the issue? We used to see this error when there was load on hcat server or some network issue connecting to the server(second one was rare occurrence) On Mon, Jul 29, 2013 at 11:13 PM, agateaaa agate...@gmail.com wrote: Hi All: We are running into frequent problem using HCatalog 0.4.1
Re: 回复: BUG IN HIVE-4650 seems not fixed
Seems it is another problem. Can you try SELECT * FROM (SELECT VAL001 x1, VAL002 x2, VAL003 x3, VAL004 x4, VAL005 y FROM (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) VAL001, (VAL002- mu2) * 1/(sd2) VAL002, (VAL003- mu3) * 1/(sd3) VAL003, (VAL004- mu4) * 1/(sd4) VAL004, (VAL005- mu5) * 1/(sd5) VAL005 FROM (SELECT x1 VAL001, x2 VAL002, x3 VAL003, x4 VAL004, y VAL005 FROM cmnt) v3 JOIN (SELECT count(*) c, avg(VAL001) mu1, avg(VAL002) mu2, avg(VAL003) mu3, avg(VAL004) mu4, avg(VAL005) mu5, stddev_pop(VAL001) sd1, stddev_pop(VAL002) sd2, stddev_pop(VAL003) sd3, stddev_pop(VAL004) sd4, stddev_pop(VAL005) sd5 FROM (SELECT * FROM (SELECT x1 VAL001, x2 VAL002, x3 VAL003, x4 VAL004, y VAL005 FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6; Also, cmnt in v3 will be used to create the hash table. Seems the part of code in converting Join to MapJoin does not play well with this part of your original query SELECT * FROM (SELECT x1 VAL001, x2 VAL002, x3 VAL003, x4 VAL004, y VAL005 FROM cmnt) obj1_3) v3 I have created https://issues.apache.org/jira/browse/HIVE-4968 to address this issue. On Sun, Jul 28, 2013 at 11:46 PM, wzc1...@gmail.com wrote: Hi: I attach the output of EXPLAIN, and the hive I use is compiled from trunk and my hadoop version is 1.0.1. I use default hive configuration. -- wzc1...@gmail.com 已使用 Sparrow http://www.sparrowmailapp.com/?sig 已使用 Sparrow http://www.sparrowmailapp.com/?sig 在 2013年7月29日星期一,下午1:08,Yin Huai 写道: Hi, Can you also post the output of EXPLAIN? The execution plan may be helpful to locate the problem. Thanks, Yin On Sun, Jul 28, 2013 at 8:06 PM, wzc1...@gmail.com wrote: What I mean by not pass the testcase in HIVE-4650 is that I compile the trunk code and run the query in HIVE-4650: SELECT * FROM (SELECT VAL001 x1, VAL002 x2, VAL003 x3, VAL004 x4, VAL005 y FROM (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) VAL001,(VAL002- mu2) * 1/(sd2) VAL002,(VAL003- mu3) * 1/(sd3) VAL003,(VAL004- mu4) * 1/(sd4) VAL004,(VAL005- mu5) * 1/(sd5) VAL005 FROM (SELECT * FROM (SELECT x1 VAL001, x2 VAL002, x3 VAL003, x4 VAL004, y VAL005 FROM cmnt) obj1_3) v3 JOIN (SELECT count(*) c, avg(VAL001) mu1, avg(VAL002) mu2, avg(VAL003) mu3, avg(VAL004) mu4, avg(VAL005) mu5, stddev_pop(VAL001) sd1, stddev_pop(VAL002) sd2, stddev_pop(VAL003) sd3, stddev_pop(VAL004) sd4, stddev_pop(VAL005) sd5 FROM (SELECT * FROM (SELECT x1 VAL001, x2 VAL002, x3 VAL003, x4 VAL004, y VAL005 FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6 ; and it still fail at the same place: … Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:198) at
Re: 回复: BUG IN HIVE-4650 seems not fixed
I just uploaded a patch to https://issues.apache.org/jira/browse/HIVE-4968. You can try it and see if the problem has been resolved for your query. On Wed, Jul 31, 2013 at 11:21 AM, Yin Huai huaiyin@gmail.com wrote: Seems it is another problem. Can you try SELECT * FROM (SELECT VAL001 x1, VAL002 x2, VAL003 x3, VAL004 x4, VAL005 y FROM (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) VAL001, (VAL002- mu2) * 1/(sd2) VAL002, (VAL003- mu3) * 1/(sd3) VAL003, (VAL004- mu4) * 1/(sd4) VAL004, (VAL005- mu5) * 1/(sd5) VAL005 FROM (SELECT x1 VAL001, x2 VAL002, x3 VAL003, x4 VAL004, y VAL005 FROM cmnt) v3 JOIN (SELECT count(*) c, avg(VAL001) mu1, avg(VAL002) mu2, avg(VAL003) mu3, avg(VAL004) mu4, avg(VAL005) mu5, stddev_pop(VAL001) sd1, stddev_pop(VAL002) sd2, stddev_pop(VAL003) sd3, stddev_pop(VAL004) sd4, stddev_pop(VAL005) sd5 FROM (SELECT * FROM (SELECT x1 VAL001, x2 VAL002, x3 VAL003, x4 VAL004, y VAL005 FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6; Also, cmnt in v3 will be used to create the hash table. Seems the part of code in converting Join to MapJoin does not play well with this part of your original query SELECT * FROM (SELECT x1 VAL001, x2 VAL002, x3 VAL003, x4 VAL004, y VAL005 FROM cmnt) obj1_3) v3 I have created https://issues.apache.org/jira/browse/HIVE-4968 to address this issue. On Sun, Jul 28, 2013 at 11:46 PM, wzc1...@gmail.com wrote: Hi: I attach the output of EXPLAIN, and the hive I use is compiled from trunk and my hadoop version is 1.0.1. I use default hive configuration. -- wzc1...@gmail.com 已使用 Sparrow http://www.sparrowmailapp.com/?sig 已使用 Sparrow http://www.sparrowmailapp.com/?sig 在 2013年7月29日星期一,下午1:08,Yin Huai 写道: Hi, Can you also post the output of EXPLAIN? The execution plan may be helpful to locate the problem. Thanks, Yin On Sun, Jul 28, 2013 at 8:06 PM, wzc1...@gmail.com wrote: What I mean by not pass the testcase in HIVE-4650 is that I compile the trunk code and run the query in HIVE-4650: SELECT * FROM (SELECT VAL001 x1, VAL002 x2, VAL003 x3, VAL004 x4, VAL005 y FROM (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) VAL001,(VAL002- mu2) * 1/(sd2) VAL002,(VAL003- mu3) * 1/(sd3) VAL003,(VAL004- mu4) * 1/(sd4) VAL004,(VAL005- mu5) * 1/(sd5) VAL005 FROM (SELECT * FROM (SELECT x1 VAL001, x2 VAL002, x3 VAL003, x4 VAL004, y VAL005 FROM cmnt) obj1_3) v3 JOIN (SELECT count(*) c, avg(VAL001) mu1, avg(VAL002) mu2, avg(VAL003) mu3, avg(VAL004) mu4, avg(VAL005) mu5, stddev_pop(VAL001) sd1, stddev_pop(VAL002) sd2, stddev_pop(VAL003) sd3, stddev_pop(VAL004) sd4, stddev_pop(VAL005) sd5 FROM (SELECT * FROM (SELECT x1 VAL001, x2 VAL002, x3 VAL003, x4 VAL004, y VAL005 FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6 ; and it still fail at the same place: … Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) at org.apache.hadoop.mapred.Child.main(Child.java:249)
error in documentation of RLIKE?
from here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-RelationalOperators A RLIKE B stringsNULL if A or B is NULL, TRUE if any (possibly empty) substring of A matches the Java regular expression B, otherwise FALSE. E.g. 'foobar' RLIKE 'foo' evaluates to FALSE whereas 'foobar' RLIKE '^f.*r$' evaluates to TRUE. 'foobar' RLIKE 'foo' evaluates to TRUE doesn't it? --Darren
Hive index error
I'm facing issues while building an index on multiple columns in a Hive(0.9.0) table. describe nas_comps; OK leg_id int ds_name string dep_datestring crr_codestring flight_no string orgnstring dstnstring physical_capint adjusted_capint closed_cap int comp_code string This works : CREATE INDEX nas_comps_legid ON TABLE nas_comps (leg_id) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD; But this doesn't : CREATE INDEX nas_comps_legid_compcode ON TABLE nas_comps (leg_id,comp_code) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD; FAILED: Error in metadata: java.lang.RuntimeException: Check the index columns, they should appear in the table being indexed. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask I guess the index is somehow able to recognize only the first column because even this failed : CREATE INDEX nas_comps_compcode ON TABLE nas_comps (comp_code) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD; FAILED: Error in metadata: java.lang.RuntimeException: Check the index columns, they should appear in the table being indexed. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask I checked thishttps://issues.apache.org/jira/browse/HIVE-4251 issue but I don't think this is the cause. Regards, Omkar Joshi The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. LT Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail