Thanks Edwards. I was exploring indexing patch and wanted to know how the index table looks like.
Few points which were not mentioned in my earlier posting: 1. I haven't created any index on the test table. So index logic is not involved here. The query is not working on normal table. 2. I checked the jobtracker result in Hadoop web interface and found that Hadoop has read 277 bytes and written 33 bytes. The counters are HDFS_BYTES_READ=277 and HDFS_BYTES_WRITTEN=23. The input file size is 277 bytes and final result is 23 bytes. So Hadoop had returned the correct output. However for some reason Hive hasn't received or returned those results. I guess it would be a minor code change. I want to identify the code and fix it temporary in my send box. Could someone please point to the module where I should look for this issue? Regards, Sachin ________________________________ From: Edward Capriolo [mailto:[email protected]] Sent: Sunday, May 02, 2010 8:08 PM To: [email protected] Subject: Re: Select not working with Index patch On Sun, May 2, 2010 at 2:29 AM, Sachin Bochare <[email protected]<mailto:[email protected]>> wrote: Hi, I applied index patch available at : https://issues.apache.org/jira/browse/HIVE-678 However after applying the indexing patch, simple select statements are not showing any results. The "select *" is working but selecting a specific column is not working. I have pasted an example below which illustrates the problem. The same select is working without the patch on the same metastore_db. The only difference between working code and non-working code is the patch. I used 796926 version of the code. The patch attached in HIVE-678 was created on this version. Following example illustrates the problem: Example with patch code: ----------------------------- ===================================== hive> create table ourtest (empid int, firstname string, lastname string, hoursworked int) partitioned by(dt string, place string) clustered by (empid) sorted by(hoursworked) into 4 buckets row format delimited fields terminated by ',' stored as textfile; OK Time taken: 0.307 seconds hive> LOAD DATA LOCAL INPATH '/root/data/ourtest_data.csv' INTO TABLE ourtest PARTITION(dt='2010-02-27', place='Pune'); Copying data from file:/root/data/ourtest_data.csv Loading data to table ourtest partition {dt=2010-02-27, place=Pune} OK Time taken: 0.753 seconds hive> select * from ourtest; ---> Select * is working fine. OK 0 firstname lastname 0 2010-02-27 Pune 1 firstname1 lastname1 1 2010-02-27 Pune 2 firstname2 lastname2 2 2010-02-27 Pune 3 firstname3 lastname3 3 2010-02-27 Pune 4 firstname4 lastname4 4 2010-02-27 Pune 5 firstname5 lastname5 5 2010-02-27 Pune 6 firstname6 lastname6 6 2010-02-27 Pune 7 firstname7 lastname7 7 2010-02-27 Pune 8 firstname8 lastname8 8 2010-02-27 Pune 9 firstname9 lastname9 9 2010-02-27 Pune 10 firstname10 lastname10 10 2010-02-27 Pune Time taken: 0.106 seconds hive> select empid from ourtest; ---> Selecting specific column is not working. Total MapReduce jobs = 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201002091652_0170, Tracking URL = http://v-hadoop3.persistent.co.in:60030/jobdetails.jsp?jobid=job_201002091652_0170 Kill Command = /root/hadoop-0.20.1/bin/../bin/hadoop job -Dmapred.job.tracker=v-hadoop3.persistent.co.in:30001<http://v-hadoop3.persistent.co.in:30001> -kill job_201002091652_0170 2010-05-02 08:40:48,951 map = 0%, reduce =0% 2010-05-02 08:40:58,044 map = 50%, reduce =0% 2010-05-02 08:40:59,057 map = 100%, reduce =0% 2010-05-02 08:41:02,067 map = 100%, reduce =100% Ended Job = job_201002091652_0170 OK Time taken: 15.494 seconds ===================================== Example without patch code: -------------------------------- Example query is working after using without-patch code on the same metastore_db. ===================================== r...@v-hadoop3<https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx>:~/<https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx>sachin/Hive-796926-Patch<https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx># ../Hive-796926/build/dist/bin/hive Hive history file=/tmp/root/hive_job_log_root_201005020928_924651644.txt hive> select empid from ourtest; Total MapReduce jobs = 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201002091652_0190, Tracking URL = http://v-hadoop3.persistent.co.in:60030/jobdetails.jsp?jobid=job_201002091652_0190 Kill Command = /root/hadoop-0.20.1/bin/../bin/hadoop job -Dmapred.job.tracker=v-hadoop3.persistent.co.in:30001<http://v-hadoop3.persistent.co.in:30001> -kill job_201002091652_0190 2010-05-02 09:29:04,733 map = 0%, reduce =0% 2010-05-02 09:29:18,799 map = 100%, reduce =0% 2010-05-02 09:29:21,823 map = 100%, reduce =100% Ended Job = job_201002091652_0190 OK 0 1 2 3 4 5 6 7 8 9 10 Time taken: 22.268 seconds ===================================== Can anyone point to what can be the problem here? Which module is a suspect here? Regards, Sachin DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails. The comments for that issue seem to suggest the patch is not complete yet. For reference 'select *' queries simply read that block data from hdfs so they do not use map-reduce (and thus probably do not use any indexes either. Edward DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
