Mike Lewis created HIVE-5273:
--------------------------------

             Summary: Subsequent use of Mapper yields 0 results
                 Key: HIVE-5273
                 URL: https://issues.apache.org/jira/browse/HIVE-5273
             Project: Hive
          Issue Type: Bug
    Affects Versions: 0.12.0, 0.13.0
            Reporter: Mike Lewis


First noticed this when using local task tracker (and is easiest to reproduce 
with it).

Created a table with one column (uuid).  Ran

{code}
SELECT uuid FROM test_foo LIMIT 5;
{code}

Results are as expected:
{code}
ace7265d-49bf-4c11-af67-0cd0a33c690e
ace7265d-49bf-4c11-af67-0cd0a33c690e
ace7265d-49bf-4c11-af67-0cd0a33c690e
ace7265d-49bf-4c11-af67-0cd0a33c690e
ace7265d-49bf-4c11-af67-0cd0a33c690e
Time taken: 40.172 seconds, Fetched: 5 row(s)
{code}

Then I run it again.

The results are not as expected:

{code}
Time taken: 55.498 seconds
{code}

The table I am querying is
{code}
hive> describe extended test_foo;
OK
uuid                    string                  None                
                 
Detailed Table Information      Table(tableName:test_foo, dbName:default, 
owner:lewis, createTime:1378934838, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:uuid, type:string, comment:null)], 
location:hdfs://gun1.sjc1c.square:8020/user/hive/warehouse/test_foo, 
inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
partitionKeys:[], parameters:{numPartitions=0, numFiles=37, 
transient_lastDdlTime=1378934838, numRows=0, totalSize=44600654909, 
rawDataSize=0}, viewOriginalText:null, viewExpandedText:null, 
tableType:MANAGED_TABLE) 
{code}

With non-local tasktracker subsequent queries work, but when doing a {{count(* 
)}} over a large data set, 0.12.0 returns only a subset of results that 0.10.0 
returns.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to