Re: Select statements return null

2013-07-31 Thread Matouk IFTISSEN
Hello Sanita,

If you use a JSON try to add the jar  'hive-json-serde.jar' before you
upload your data in the final table. And also try to make your date
attributes in String format first to debug (if this is the cause).

I don't know if you are using an external table with regular expressions
(regexp) to pasre your data?; if this is, can you send us the definition of
table and the structure of a row from your data.
the final way that I can suggest is to run an operation mapreduce over the
table (select count (1) from your_table) and then see the log of jobtracker
to debug the issue.

hope this can help you ;)




2013/7/30 Sunita Arvind sunitarv...@gmail.com

 Hi,

 I have written a script which generates JSON files, loads it into a
 dictionary, adds a few attributes and uploads the modified files to HDFS.
 After the files are generated, if I perform a select * from..; on the table
 which points to this location, I get null, null as the result. I also
 tried without the added attributes and it did not make a difference. I
 strongly suspect the data.
 Currently I am using strip() to eliminate trailing and leading whitespaces
 and newlines. Wondering if embedded \n that is, json string objects
 containing \n in the value, causes such issues.
 There are no parsing errors, so I am not able to debug this issue. Are
 there any flags that I can set to figure out what is happening within the
 parser code?

 I set this:
 hive -hiveconf hive.root.logger=DEBUG,console

 But the output is not really useful:

 blocks=[LocatedBlock{BP-330966259-192.168.1.61-1351349834344:blk_-6076570611719758877_116734;
 getBlockSize()=20635; corrupt=false; offset=0; locs=[192.168.1.61:50010,
 192.168.1.66:50010, 192.168.1.63:50010]}]

 lastLocatedBlock=LocatedBlock{BP-330966259-192.168.1.61-1351349834344:blk_-6076570611719758877_116734;
 getBlockSize()=20635; corrupt=false; offset=0; locs=[192.168.1.61:50010,
 192.168.1.66:50010, 192.168.1.63:50010]}
   isLastBlockComplete=true}
 13/07/30 11:49:41 DEBUG hdfs.DFSClient: Connecting to datanode
 192.168.1.61:50010
 null
 null
 null
 null
 null
 null
 null
 null
 null
 null
 null
 null
 null
 null
 null
 null
 13/07/30 11:49:41 INFO exec.

 Also, the attributes I am adding are current year, month day and time. So
 they are not null for any record. I even moved existing files which did not
 have these fields set so that there are no records with these fields as
 null. However, I dont think this is an issue as the advantage of JSON/Hive
 JSON serde is that it allows object struct to be dynamic. Right?

 Any suggestion regarding debugging would be very helpful.

 thanks
 Sunita



Re: UDFs with package names

2013-07-31 Thread Michael Malak
Yup, it was the directory structure com/mystuff/whateverUDF.class that was 
missing.  Thought I had tried that before posting my question, but...

Thanks for your help!



 From: Edward Capriolo edlinuxg...@gmail.com
To: user@hive.apache.org user@hive.apache.org; Michael Malak 
michaelma...@yahoo.com 
Sent: Tuesday, July 30, 2013 7:06 PM
Subject: Re: UDFs with package names
 


It might be a better idea to use your own package com.mystuff.x. You might be 
running into an issue where java is not finding the file because it assumes the 
relation between package and jar is 1 to 1. You might also be compiling wrong 
If your package is com.mystuff that class file should be in a directory 
structure com/mystuff/whateverUDF.class I am not seeing that from your example.




On Tue, Jul 30, 2013 at 8:00 PM, Michael Malak michaelma...@yahoo.com wrote:

Thus far, I've been able to create Hive UDFs, but now I need to define them 
within a Java package name (as opposed to the default Java package as I had 
been doing), but once I do that, I'm no longer able to load them into Hive.

First off, this works:

add jar /usr/lib/hive/lib/hive-contrib-0.10.0-cdh4.3.0.jar;
create temporary function row_sequence as 
'org.apache.hadoop.hive.contrib.udf.UDFRowSequence';

Then I took the source code for UDFRowSequence.java from
http://svn.apache.org/repos/asf/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/udf/UDFRowSequence.java

and renamed the file and the class inside to UDFRowSequence2.java

I compile and deploy it with:
javac -cp 
/usr/lib/hive/lib/hive-exec-0.10.0-cdh4.3.0.jar:/usr/lib/hadoop/hadoop-common.jar
 UDFRowSequence2.java
jar cvf UDFRowSequence2.jar UDFRowSequence2.class
sudo cp UDFRowSequence2.jar /usr/local/lib


But in Hive, I get the following:
hive  add jar /usr/local/lib/UDFRowSequence2.jar;
Added /usr/local/lib/UDFRowSequence2.jar to class path
Added resource: /usr/local/lib/UDFRowSequence2.jar
hive create temporary function row_sequence as 
'org.apache.hadoop.hive.contrib.udf.UDFRowSequence2';
FAILED: Class org.apache.hadoop.hive.contrib.udf.UDFRowSequence2 not found
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.FunctionTask

But if I comment out the package line in UDFRowSequence2.java (to put the UDF 
into the default Java package), it works:
hive  add jar /usr/local/lib/UDFRowSequence2.jar;
Added /usr/local/lib/UDFRowSequence2.jar to class path
Added resource: /usr/local/lib/UDFRowSequence2.jar
hive create temporary function row_sequence as 'UDFRowSequence2';
OK
Time taken: 0.383 seconds

What am I doing wrong?  I have a feeling it's something simple.



Re: Review Request (wikidoc): LZO Compression in Hive

2013-07-31 Thread Sanjay Subramanian
Hi guys

Any chance I could get cwiki update privileges today ?

Thanks

sanjay

From: Sanjay Subramanian 
sanjay.subraman...@wizecommerce.commailto:sanjay.subraman...@wizecommerce.com
Date: Tuesday, July 30, 2013 4:26 PM
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Cc: d...@hive.apache.orgmailto:d...@hive.apache.org 
d...@hive.apache.orgmailto:d...@hive.apache.org
Subject: Review Request (wikidoc): LZO Compression in Hive

Hi

Met with Lefty this afternoon and she was kind to spend time to add my 
documentation to the site - since I still don't have editing privileges :-)

Please review the new wikidoc about LZO compression in the Hive language 
manual.  If anything is unclear or needs more information, you can email 
suggestions to this list or edit the wiki yourself (if you have editing 
privileges).  Here are the links:

  1.  Language 
Manualhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual (new 
bullet under File Formats)
  2.  LZO 
Compressionhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO
  3.  CREATE 
TABLEhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
 (near end of section, pasted in here:)
Use STORED AS TEXTFILE if the data needs to be stored as plain text files. Use 
STORED AS SEQUENCEFILE if the data needs to be compressed. Please read more 
about 
CompressedStoragehttps://cwiki.apache.org/confluence/display/Hive/CompressedStorage
 if you are planning to keep data compressed in your Hive tables. Use 
INPUTFORMAT and OUTPUTFORMAT to specify the name of a corresponding InputFormat 
and OutputFormat class as a string literal, e.g., 
'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextInputFormat'. For 
LZO compression, the values to use are 'INPUTFORMAT 
com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' (see LZO 
Compressionhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO).

My cwiki id is
https://cwiki.apache.org/confluence/display/~sanjaysubraman...@yahoo.com
It will be great if I could get edit privileges

Thanks
sanjay

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.


Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors

2013-07-31 Thread agateaaa
Thanks Nitin

There arent too many connections in close_wait state only 1 or two when we
run into this. Most likely its because of dropped connection.

I could not find any read or write timeouts we can set for the thrift
server which will tell thrift to hold on to the client connection.
 See this https://issues.apache.org/jira/browse/HIVE-2006 but doesnt seem
to have been implemented yet. We do have set a client connection timeout
but cannot find
an equivalent setting for the server.

We have  a suspicion that this happens when we run two client processes
which modify two distinct partitions of the same hive table. We put in a
workaround so that the two hive client processes never run together and so
far things look ok but we will keep monitoring.

Could it be because hive metastore server is not thread safe, would running
two alter table statements on two distinct partitions of the same table
using two client connections cause problems like these, where hive
metastore server closes or drops a wrong client connection and leaves the
other hanging?

Agateaaa




On Tue, Jul 30, 2013 at 12:49 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 The mentioned flow is called when you have unsecure mode of thrift
 metastore client-server connection. So one way to avoid this is have a
 secure way.

 code
 public boolean process(final TProtocol in, final TProtocol out)
 throwsTException {
 setIpAddress(in);
 ...
 ...
 ...
 @Override
  protected void setIpAddress(final TProtocol in) {
 TUGIContainingTransport ugiTrans =
 (TUGIContainingTransport)in.getTransport();
 Socket socket = ugiTrans.getSocket();
 if (socket != null) {
   setIpAddress(socket);

 /code


 From the above code snippet, it looks like the null pointer exception is
 not handled if the getSocket returns null.

 can you check whats the ulimit setting on the server? If its set to default
 can you set it to unlimited and restart hcat server. (This is just a wild
 guess).

 also the getSocket method suggests If the underlying TTransport is an
 instance of TSocket, it returns the Socket object which it contains.
 Otherwise it returns null.

 so someone from thirft gurus need to tell us whats happening. I have no
 knowledge of this depth

 may be Ashutosh or Thejas will be able to help on this.




 From the netstat close_wait, it looks like the hive metastore server has
 not closed the connection (do not know why yet), may be the hive dev guys
 can help.Are there too many connections in close_wait state?



 On Tue, Jul 30, 2013 at 5:52 AM, agateaaa agate...@gmail.com wrote:

  Looking at the hive metastore server logs see errors like these:
 
  2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer
  (TThreadPoolServer.java:run(182)) - Error occurred during processing of
  message.
  java.lang.NullPointerException
  at
 
 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183)
  at
 
 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79)
  at
 
 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
  at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
 
  approx same time as we see timeout or connection reset errors.
 
  Dont know if this is the cause or the side affect of he connection
  timeout/connection reset errors. Does anybody have any pointers or
  suggestions ?
 
  Thanks
 
 
  On Mon, Jul 29, 2013 at 11:29 AM, agateaaa agate...@gmail.com wrote:
 
   Thanks Nitin!
  
   We have simiar setup (identical hcatalog and hive server versions) on a
   another production environment and dont see any errors (its been
 running
  ok
   for a few months)
  
   Unfortunately we wont be able to move to hcat 0.5 and hive 0.11 or hive
   0.10 soon.
  
   I did see that the last time we ran into this problem doing a
 netstat-ntp
   | grep :1 see that server was holding on to one socket connection
  in
   CLOSE_WAIT state for a long time
(hive metastore server is running on port 1). Dont know if thats
   relevant here or not
  
   Can you suggest any hive configuration settings we can tweak or
  networking
   tools/tips, we can use to narrow this down ?
  
   Thanks
   Agateaaa
  
  
  
  
   On Mon, Jul 29, 2013 at 11:02 AM, Nitin Pawar nitinpawar...@gmail.com
  wrote:
  
   Is there any chance you can do a update on test environment with
  hcat-0.5
   and hive-0(11 or 10) and see if you can reproduce the issue?
  
   We used to see this error when there was load on hcat server or some
   network issue connecting to the server(second one was rare occurrence)
  
  
   On Mon, Jul 29, 2013 at 11:13 PM, agateaaa agate...@gmail.com
 wrote:
  
   Hi All:
  
   We are running into frequent problem using HCatalog 0.4.1 

Re: 回复: BUG IN HIVE-4650 seems not fixed

2013-07-31 Thread Yin Huai
Seems it is another problem.
Can you try

SELECT *
FROM (SELECT VAL001 x1,
 VAL002 x2,
 VAL003 x3,
 VAL004 x4,
 VAL005 y
  FROM (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) VAL001,
   (VAL002- mu2) * 1/(sd2) VAL002,
   (VAL003- mu3) * 1/(sd3) VAL003,
   (VAL004- mu4) * 1/(sd4) VAL004,
   (VAL005- mu5) * 1/(sd5) VAL005
FROM (SELECT x1 VAL001,
 x2 VAL002,
 x3 VAL003,
 x4 VAL004,
 y VAL005
  FROM cmnt) v3
JOIN (SELECT count(*) c,
 avg(VAL001) mu1,
 avg(VAL002) mu2,
 avg(VAL003) mu3,
 avg(VAL004) mu4,
 avg(VAL005) mu5,
 stddev_pop(VAL001) sd1,
 stddev_pop(VAL002) sd2,
 stddev_pop(VAL003) sd3,
 stddev_pop(VAL004) sd4,
 stddev_pop(VAL005) sd5
  FROM (SELECT *
FROM (SELECT x1 VAL001,
 x2 VAL002,
 x3 VAL003,
 x4 VAL004,
 y VAL005
  FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6;

Also, cmnt in v3 will be used to create the hash table. Seems the part of
code in converting Join to MapJoin does not play well with this part of
your original query

SELECT *
 FROM
   (SELECT x1 VAL001,
   x2 VAL002,
   x3 VAL003,
   x4 VAL004,
   y VAL005
FROM cmnt) obj1_3) v3


I have created https://issues.apache.org/jira/browse/HIVE-4968 to address
this issue.




On Sun, Jul 28, 2013 at 11:46 PM, wzc1...@gmail.com wrote:

 Hi:
 I attach the output of EXPLAIN, and the hive I use is compiled from trunk
 and my hadoop version is 1.0.1. I use default hive configuration.


 --
 wzc1...@gmail.com
 已使用 Sparrow http://www.sparrowmailapp.com/?sig

 已使用 Sparrow http://www.sparrowmailapp.com/?sig

 在 2013年7月29日星期一,下午1:08,Yin Huai 写道:

 Hi,

 Can you also post the output of EXPLAIN? The execution plan may be helpful
 to locate the problem.

 Thanks,

 Yin


 On Sun, Jul 28, 2013 at 8:06 PM, wzc1...@gmail.com wrote:

 What I mean by not pass the testcase in HIVE-4650 is that I compile the
 trunk code and run the query in HIVE-4650:
 SELECT *
 FROM
   (SELECT VAL001 x1,
   VAL002 x2,
   VAL003 x3,
   VAL004 x4,
   VAL005 y
FROM
  (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) VAL001,(VAL002-
 mu2) * 1/(sd2) VAL002,(VAL003- mu3) * 1/(sd3) VAL003,(VAL004- mu4) *
 1/(sd4) VAL004,(VAL005- mu5) * 1/(sd5) VAL005
   FROM
 (SELECT *
  FROM
(SELECT x1 VAL001,
x2 VAL002,
x3 VAL003,
x4 VAL004,
y VAL005
 FROM cmnt) obj1_3) v3
   JOIN
 (SELECT count(*) c,
 avg(VAL001) mu1,
 avg(VAL002) mu2,
 avg(VAL003) mu3,
 avg(VAL004) mu4,
 avg(VAL005) mu5,
 stddev_pop(VAL001) sd1,
 stddev_pop(VAL002) sd2,
 stddev_pop(VAL003) sd3,
 stddev_pop(VAL004) sd4,
 stddev_pop(VAL005) sd5
  FROM
(SELECT *
 FROM
   (SELECT x1 VAL001,
   x2 VAL002,
   x3 VAL003,
   x4 VAL004,
   y VAL005
FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6 ;

 and it still fail at the same place:
 …
 Diagnostic Messages for this Task:
 java.lang.RuntimeException:
 org.apache.hadoop.hive.ql.metadata.HiveException:
 java.lang.NullPointerException
 at
 org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
 java.lang.NullPointerException
 at
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:198)
 at
 

Re: 回复: BUG IN HIVE-4650 seems not fixed

2013-07-31 Thread Yin Huai
I just uploaded a patch to https://issues.apache.org/jira/browse/HIVE-4968.
You can try it and see if the problem has been resolved for your query.


On Wed, Jul 31, 2013 at 11:21 AM, Yin Huai huaiyin@gmail.com wrote:

 Seems it is another problem.
 Can you try


 SELECT *
 FROM (SELECT VAL001 x1,
  VAL002 x2,
  VAL003 x3,
  VAL004 x4,
  VAL005 y
   FROM (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) VAL001,
(VAL002- mu2) * 1/(sd2) VAL002,
(VAL003- mu3) * 1/(sd3) VAL003,
(VAL004- mu4) * 1/(sd4) VAL004,
(VAL005- mu5) * 1/(sd5) VAL005
 FROM (SELECT x1 VAL001,

  x2 VAL002,
  x3 VAL003,
  x4 VAL004,
  y VAL005
   FROM cmnt) v3

 JOIN (SELECT count(*) c,
  avg(VAL001) mu1,
  avg(VAL002) mu2,
  avg(VAL003) mu3,
  avg(VAL004) mu4,
  avg(VAL005) mu5,
  stddev_pop(VAL001) sd1,
  stddev_pop(VAL002) sd2,
  stddev_pop(VAL003) sd3,
  stddev_pop(VAL004) sd4,
  stddev_pop(VAL005) sd5
   FROM (SELECT *
 FROM (SELECT x1 VAL001,
  x2 VAL002,
  x3 VAL003,
  x4 VAL004,
  y VAL005
   FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6;

 Also, cmnt in v3 will be used to create the hash table. Seems the part of
 code in converting Join to MapJoin does not play well with this part of
 your original query


 SELECT *
  FROM
(SELECT x1 VAL001,
x2 VAL002,
x3 VAL003,
x4 VAL004,
y VAL005
 FROM cmnt) obj1_3) v3


 I have created https://issues.apache.org/jira/browse/HIVE-4968 to address
 this issue.




 On Sun, Jul 28, 2013 at 11:46 PM, wzc1...@gmail.com wrote:

 Hi:
 I attach the output of EXPLAIN, and the hive I use is compiled from trunk
 and my hadoop version is 1.0.1. I use default hive configuration.


 --
 wzc1...@gmail.com
 已使用 Sparrow http://www.sparrowmailapp.com/?sig

 已使用 Sparrow http://www.sparrowmailapp.com/?sig

 在 2013年7月29日星期一,下午1:08,Yin Huai 写道:

 Hi,

 Can you also post the output of EXPLAIN? The execution plan may be
 helpful to locate the problem.

 Thanks,

 Yin


 On Sun, Jul 28, 2013 at 8:06 PM, wzc1...@gmail.com wrote:

 What I mean by not pass the testcase in HIVE-4650 is that I compile the
 trunk code and run the query in HIVE-4650:
 SELECT *
 FROM
   (SELECT VAL001 x1,
   VAL002 x2,
   VAL003 x3,
   VAL004 x4,
   VAL005 y
FROM
  (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) VAL001,(VAL002-
 mu2) * 1/(sd2) VAL002,(VAL003- mu3) * 1/(sd3) VAL003,(VAL004- mu4) *
 1/(sd4) VAL004,(VAL005- mu5) * 1/(sd5) VAL005
   FROM
 (SELECT *
  FROM
(SELECT x1 VAL001,
x2 VAL002,
x3 VAL003,
x4 VAL004,
y VAL005
 FROM cmnt) obj1_3) v3
   JOIN
 (SELECT count(*) c,
 avg(VAL001) mu1,
 avg(VAL002) mu2,
 avg(VAL003) mu3,
 avg(VAL004) mu4,
 avg(VAL005) mu5,
 stddev_pop(VAL001) sd1,
 stddev_pop(VAL002) sd2,
 stddev_pop(VAL003) sd3,
 stddev_pop(VAL004) sd4,
 stddev_pop(VAL005) sd5
  FROM
(SELECT *
 FROM
   (SELECT x1 VAL001,
   x2 VAL002,
   x3 VAL003,
   x4 VAL004,
   y VAL005
FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6 ;

 and it still fail at the same place:
 …
 Diagnostic Messages for this Task:
 java.lang.RuntimeException:
 org.apache.hadoop.hive.ql.metadata.HiveException:
 java.lang.NullPointerException
 at
 org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 

error in documentation of RLIKE?

2013-07-31 Thread Darren Yin
from here:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-RelationalOperators
 A RLIKE B stringsNULL if A or B is NULL, TRUE if any (possibly empty)
substring of A matches the Java regular expression B, otherwise FALSE. E.g.
'foobar' RLIKE 'foo' evaluates to FALSE whereas 'foobar' RLIKE '^f.*r$'
evaluates to TRUE.
'foobar' RLIKE 'foo' evaluates to TRUE doesn't it?

--Darren


Hive index error

2013-07-31 Thread Omkar Joshi
I'm facing issues while building an index on multiple columns in a Hive(0.9.0) 
table.

describe nas_comps;

OK

leg_id  int

ds_name string

dep_datestring

crr_codestring

flight_no   string

orgnstring

dstnstring

physical_capint

adjusted_capint

closed_cap  int

comp_code   string

This works :

CREATE INDEX nas_comps_legid ON TABLE nas_comps (leg_id) AS 
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED 
REBUILD;

But this doesn't :

CREATE INDEX nas_comps_legid_compcode ON TABLE nas_comps (leg_id,comp_code) 
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED 
REBUILD;



FAILED: Error in metadata: java.lang.RuntimeException: Check the index columns, 
they should appear in the table being indexed.

FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask

I guess the index is somehow able to recognize only the first column because 
even this failed :

CREATE INDEX nas_comps_compcode ON TABLE nas_comps (comp_code) AS 
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED 
REBUILD;



FAILED: Error in metadata: java.lang.RuntimeException: Check the index columns, 
they should appear in the table being indexed.

FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask

I checked thishttps://issues.apache.org/jira/browse/HIVE-4251 issue but I 
don't think this is the cause.


Regards,
Omkar Joshi


The contents of this e-mail and any attachment(s) may contain confidential or 
privileged information for the intended recipient(s). Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail and 
using or disseminating the information, and must notify the sender and delete 
it from their system. LT Infotech will not accept responsibility or liability 
for the accuracy or completeness of, or the presence of any virus or disabling 
code in this e-mail