For one of the hive table I switched from TextFile to SequenceFile format. This is how I created the new table:
CREATE EXTERNAL TABLE IMPRESSIONS ( A STRING, B STRING) PARTITIONED BY(DATA_DATE STRING COMMENT 'yyyyMMdd (e.g. 20090801) on which log records are collected') ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS SEQUENCEFILE LOCATION '/user/hadoop/warehouse/facts/impressions/'; This external table is sourced by our custom ETL job which writes data in MultipleSequenceFileOutputFormat. When I issue simple query like: SELECT * FROM IMPRESSIONS; This is what I am getting for all the records: NULL NULL 20090715 NULL NULL 20090715 NULL NULL 20090715 .... But if I do: hadoop dfs -text /user/hadoop/warehouse/facts/impressions/data_date=20090715/* | less I get expected output. Previously I was using MultipleTextFileOutputFormat to feed TextFile version of this table and it worked well. Any hints? Thanks, Abhi
