Re: getting all null values

Bobby Rullo Mon, 05 Oct 2009 11:45:42 -0700

I'm actually having a similar problem with using sequence files.


I create a table like so:

CREATE TABLE txns (
       tid STRING,
       txn STRING
)
PARTITIONED BY(dt DATETIME)
STORED AS SEQUENCEFILE;

And then I point to one our existing transaction log files, which aregzipped compressed sequence files where the 'tid' is the key and 'txn'is the value:

ALTER TABLE txns ADD PARTITION (dt="2009_04_01") LOCATION '/user/br/2009_04_01';


Then I go to inspect the table like:

SELECT txn from txns limit 3;

and it's just NULL's. So then I try to see what is in tid:
        
SELECT tid from txns limit 3;

And it's the txn field!

So it seems that Hive is ignoring the 'key' in sequence files....Isthere anyway to make that not happen?

On a related note, I noticed that when I dropped the table (DROP TABLEtxns;), the files were not deleted. Why is that? Is it because I addedthem via "ALTER TABLE/ADD PARITITION"? If so, can I rely on thisbehavior in the future? I like being able to drop a table and notworry about all my files disappearing!


Thanks

Bobby

On Sep 18, 2009, at 3:16 PM, Avram Aelony wrote:

Check your field delimiter?

-A





From: Abhijit Pol [mailto:[email protected]]
Sent: Friday, September 18, 2009 2:55 PM
To: [email protected]
Subject: getting all null values
For one of the hive table I switched from TextFile to SequenceFileformat. This is how I created the new table:
CREATE EXTERNAL TABLE IMPRESSIONS ( A STRING, B STRING)
PARTITIONED BY(DATA_DATE STRING COMMENT 'yyyyMMdd (e.g. 20090801) onwhich log records are collected')
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS SEQUENCEFILE
LOCATION '/user/hadoop/warehouse/facts/impressions/';
This external table is sourced by our custom ETL job which writesdata in MultipleSequenceFileOutputFormat.
When I issue simple query like: SELECT * FROM IMPRESSIONS;
This is what I am getting for all the records:
NULL    NULL    20090715
NULL    NULL    20090715
NULL    NULL    20090715
....
But if I do: hadoop dfs -text /user/hadoop/warehouse/facts/impressions/data_date=20090715/* | less
I get expected output.
Previously I was using MultipleTextFileOutputFormat to feed TextFileversion of this table and it worked well.
Any hints?

Thanks,
Abhi

Re: getting all null values

Reply via email to