Hi Eva, Can you give us two lines of the data so that we can debug? Also, what does "select count(1) from tablename" return?
Zheng On Thu, Jul 9, 2009 at 5:38 PM, Eva Tse<[email protected]> wrote: > > When we load the output generated by the reducer to hive, we run into some > issues with ‘is NULL’ and ‘is not NULL’ operators. Both returns zero when we > issue two queries like select count(1) from tablename where column_name is > NULL or select count(1) from tablename where column_name is NOT NULL, which > shouldn’t be possible. column_name is string type. > > We tried using both SequenceFileOutput and TextFileOutput formats and we get > similar results. We are currently using Hadoop 0.20 unpatched and Hive trunk > (r786648) w/ HIVE-487 patch. > > We suspect it is because of the fileformat that we are loading? But it used > to work with TextOutputFormat with 0.3 Hive. We have attached the test_log > table definition as well as how we generate the output files to be loaded > into this table. > > Please let us know if anyone sees anything wrong or hits the same issue w/ a > workaround, etc. > > Thanks in advance, > Eva. > > Describe extended test_log; > esn string > server_utc_ms bigint > devtype_id int > nccphn string > server_msg string > other_properties map<string,string> > dateint int > hour int > > Detailed Table Information > Table(tableName:test_log,dbName:default,owner:dataeng,createTime:1247184142,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:esn,type:string,comment:null), > FieldSchema(name:server_utc_ms,type:bigint,comment:null), > FieldSchema(name:devtype_id,type:int,comment:null), > FieldSchema(name:nccphn,type:string,comment:null), > FieldSchema(name:server_msg,type:string,comment:null), > FieldSchema(name:other_properties,type:map<string,string>,comment:null)],location:hdfs://ip-xxxxx.ec2.internal:9000/user/hive/warehouse/test_log,inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat,compressed:false,numBuckets:-1,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,parameters:{colelction.delim=,mapkey.delim=,serialization.format=1,line.delim= > > ,field.delim=}),bucketCols:[],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:dateint,type:int,comment:null), > FieldSchema(name:hour,type:int,comment:null)],parameters:{}) > > > Output Format from the reducer: > SequenceFileOutputFormat > Key = NullWritable > Value = Text with: DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS > TERMINATED BY '\004' MAP KEYS TERMINATED BY '\002' > > OR > > TextFileOutputFormat > Key = NullWritable > Value = Text with: DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS > TERMINATED BY '\004' MAP KEYS TERMINATED BY '\002' LINES TERMINATED by '\n' > > > -- Yours, Zheng
