Hi Eva,

Can you give us two lines of the data so that we can debug?
Also, what does "select count(1) from tablename" return?


Zheng

On Thu, Jul 9, 2009 at 5:38 PM, Eva Tse<[email protected]> wrote:
>
> When we load the output generated by the reducer to hive, we run into some
> issues with ‘is NULL’ and ‘is not NULL’ operators. Both returns zero when we
> issue two queries like select count(1) from tablename where column_name is
> NULL  or select count(1) from tablename where column_name is NOT NULL, which
> shouldn’t be possible. column_name is string type.
>
> We tried using both SequenceFileOutput and TextFileOutput formats and we get
> similar results. We are currently using Hadoop 0.20 unpatched and Hive trunk
> (r786648) w/ HIVE-487 patch.
>
> We suspect it is because of the fileformat that we are loading? But it used
> to work with TextOutputFormat with 0.3 Hive. We have attached the test_log
> table definition as well as how we generate the output files to be loaded
> into this table.
>
> Please let us know if anyone sees anything wrong or hits the same issue w/ a
> workaround, etc.
>
> Thanks in advance,
> Eva.
>
> Describe extended test_log;
> esn     string
> server_utc_ms   bigint
> devtype_id      int
> nccphn  string
> server_msg      string
> other_properties        map<string,string>
> dateint int
> hour    int
>
> Detailed Table Information
> Table(tableName:test_log,dbName:default,owner:dataeng,createTime:1247184142,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:esn,type:string,comment:null),
> FieldSchema(name:server_utc_ms,type:bigint,comment:null),
> FieldSchema(name:devtype_id,type:int,comment:null),
> FieldSchema(name:nccphn,type:string,comment:null),
> FieldSchema(name:server_msg,type:string,comment:null),
> FieldSchema(name:other_properties,type:map<string,string>,comment:null)],location:hdfs://ip-xxxxx.ec2.internal:9000/user/hive/warehouse/test_log,inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat,compressed:false,numBuckets:-1,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,parameters:{colelction.delim=,mapkey.delim=,serialization.format=1,line.delim=
>
> ,field.delim=}),bucketCols:[],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:dateint,type:int,comment:null),
> FieldSchema(name:hour,type:int,comment:null)],parameters:{})
>
>
> Output Format from the reducer:
> SequenceFileOutputFormat
> Key = NullWritable
> Value = Text with: DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS
> TERMINATED BY '\004' MAP KEYS TERMINATED BY '\002'
>
> OR
>
> TextFileOutputFormat
> Key = NullWritable
> Value = Text with: DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS
> TERMINATED BY '\004' MAP KEYS TERMINATED BY '\002' LINES TERMINATED by '\n'
>
>
>



-- 
Yours,
Zheng

Reply via email to