RE: Select not working with Index patch

Sachin Bochare Sun, 02 May 2010 21:29:06 -0700

Thanks Edwards.

I was exploring indexing patch and wanted to know how the index table looks 
like.


Few points which were not mentioned in my earlier posting:

1. I haven't created any index on the test table. So index logic is not 
involved here. The query is not working on normal table.
2. I checked the jobtracker result in Hadoop web interface and found that 
Hadoop has read 277 bytes and written 33 bytes. The counters are 
HDFS_BYTES_READ=277 and HDFS_BYTES_WRITTEN=23. The input file size is 277 bytes 
and final result is 23 bytes. So Hadoop had returned the correct output. 
However for some reason Hive hasn't received or returned those results.

I guess it would be a minor code change. I want to identify the code and fix it 
temporary in my send box. Could someone please point to the module where I 
should look for this issue?

Regards,
Sachin

________________________________
From: Edward Capriolo [mailto:[email protected]]
Sent: Sunday, May 02, 2010 8:08 PM
To: [email protected]
Subject: Re: Select not working with Index patch


On Sun, May 2, 2010 at 2:29 AM, Sachin Bochare 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I applied index patch available at : 
https://issues.apache.org/jira/browse/HIVE-678

However after applying the indexing patch, simple select statements are not 
showing any results. The "select *" is working but selecting a specific column 
is not working. I have pasted an example below which illustrates the problem.

The same select is working without the patch on the same metastore_db. The only 
difference between working code and non-working code is the patch.

I used 796926 version of the code. The patch attached in HIVE-678 was created 
on this version.

Following example illustrates the problem:

Example with patch code:
-----------------------------

=====================================
hive> create table ourtest (empid int, firstname string, lastname string, 
hoursworked int) partitioned by(dt string, place string) clustered by (empid) 
sorted by(hoursworked) into 4 buckets row format delimited fields terminated by 
',' stored as textfile;
OK
Time taken: 0.307 seconds
hive> LOAD DATA LOCAL INPATH '/root/data/ourtest_data.csv' INTO TABLE ourtest 
PARTITION(dt='2010-02-27', place='Pune');
Copying data from file:/root/data/ourtest_data.csv
Loading data to table ourtest partition {dt=2010-02-27, place=Pune}
OK
Time taken: 0.753 seconds
hive> select * from ourtest; ---> Select * is working fine.
OK
0       firstname       lastname        0       2010-02-27      Pune
1       firstname1      lastname1       1       2010-02-27      Pune
2       firstname2      lastname2       2       2010-02-27      Pune
3       firstname3      lastname3       3       2010-02-27      Pune
4       firstname4      lastname4       4       2010-02-27      Pune
5       firstname5      lastname5       5       2010-02-27      Pune
6       firstname6      lastname6       6       2010-02-27      Pune
7       firstname7      lastname7       7       2010-02-27      Pune
8       firstname8      lastname8       8       2010-02-27      Pune
9       firstname9      lastname9       9       2010-02-27      Pune
10      firstname10     lastname10      10      2010-02-27      Pune
Time taken: 0.106 seconds
hive> select empid from ourtest; ---> Selecting specific column is not working.
Total MapReduce jobs = 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201002091652_0170, Tracking URL = 
http://v-hadoop3.persistent.co.in:60030/jobdetails.jsp?jobid=job_201002091652_0170
Kill Command = /root/hadoop-0.20.1/bin/../bin/hadoop job  
-Dmapred.job.tracker=v-hadoop3.persistent.co.in:30001<http://v-hadoop3.persistent.co.in:30001>
 -kill job_201002091652_0170
2010-05-02 08:40:48,951 map = 0%,  reduce =0%
2010-05-02 08:40:58,044 map = 50%,  reduce =0%
2010-05-02 08:40:59,057 map = 100%,  reduce =0%
2010-05-02 08:41:02,067 map = 100%,  reduce =100%
Ended Job = job_201002091652_0170
OK
Time taken: 15.494 seconds
=====================================

Example without patch code:
--------------------------------
Example query is working after using without-patch code on the same 
metastore_db.

=====================================
r...@v-hadoop3<https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx>:~/<https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx>sachin/Hive-796926-Patch<https://puneexchange.persistent.co.in/owa/UrlBlockedError.aspx>#
 ../Hive-796926/build/dist/bin/hive
Hive history file=/tmp/root/hive_job_log_root_201005020928_924651644.txt
hive> select empid from ourtest;
Total MapReduce jobs = 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201002091652_0190, Tracking URL = 
http://v-hadoop3.persistent.co.in:60030/jobdetails.jsp?jobid=job_201002091652_0190
Kill Command = /root/hadoop-0.20.1/bin/../bin/hadoop job  
-Dmapred.job.tracker=v-hadoop3.persistent.co.in:30001<http://v-hadoop3.persistent.co.in:30001>
 -kill job_201002091652_0190
2010-05-02 09:29:04,733 map = 0%,  reduce =0%
2010-05-02 09:29:18,799 map = 100%,  reduce =0%
2010-05-02 09:29:21,823 map = 100%,  reduce =100%
Ended Job = job_201002091652_0190
OK
0
1
2
3
4
5
6
7
8
9
10
Time taken: 22.268 seconds
=====================================

Can anyone point to what can be the problem here? Which module is a suspect 
here?

Regards,
Sachin

DISCLAIMER ========== This e-mail may contain privileged and confidential 
information which is the property of Persistent Systems Ltd. It is intended 
only for the use of the individual or entity to which it is addressed. If you 
are not the intended recipient, you are not authorized to read, retain, copy, 
print, distribute or use this message. If you have received this communication 
in error, please notify the sender and delete all copies of this message. 
Persistent Systems Ltd. does not accept any liability for virus infected mails.

The comments for that issue seem to suggest the patch is not complete yet. For 
reference 'select *' queries simply read that block data from hdfs so they do 
not use map-reduce (and thus probably do not use any indexes either.

Edward

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

RE: Select not working with Index patch

Reply via email to