Can you send the new output of describe extended? Thanks, pete
On 11/26/08 6:46 AM, "Ashish Thusoo" <[EMAIL PROTECTED]> wrote: congrats!! :) please do file a JIRA for this. We will fix this as soon as possible.. Ashish ________________________________________ From: Josh Ferguson [EMAIL PROTECTED] Sent: Wednesday, November 26, 2008 12:44 AM To: [email protected] Subject: Re: Trouble Loading Into External Table I got it! The combination I got to work was the following: CREATE EXTERNAL TABLE basic (actor_id INT, actee_id INT, properties MAP<STRING, STRING>) ROW FORMAT DELIMITED COLLECTION ITEMS TERMINATED BY '44' MAP KEYS TERMINATED BY '58' STORED AS TEXTFILE LOCATION '/data/sample2'; This forced a DynamicSerde/TCTLSeperatedProtocol combo which didn't work properly when I specified the field delimiter but works fine when you only specify the other two for some reason and leave the field delimiter as the default, I should probably file it in JIRA. Josh Ferguson On Nov 25, 2008, at 7:09 PM, Joydeep Sen Sarma wrote: > > Can you please send the output of 'describe extended > activity_test'. This will help us understand what's happening with > all the create table parameters. > > Also - as a sanity check - can you please check hadoop dfs -cat / > data/sample/* (to make sure data got loaded/moved into that dir) > > -----Original Message----- > From: Josh Ferguson [mailto:[EMAIL PROTECTED] > Sent: Tuesday, November 25, 2008 7:03 PM > To: [email protected] > Subject: Re: Trouble Loading Into External Table > > hive> CREATE EXTERNAL TABLE activity_test >> (occurred_at INT, actor_id INT, actee_id INT, properties > MAP<STRING, STRING>) >> ROW FORMAT DELIMITED >> FIELDS TERMINATED BY '124' >> COLLECTION ITEMS TERMINATED BY '44' >> MAP KEYS TERMINATED BY '58' >> LINES TERMINATED BY '10' >> STORED AS TEXTFILE >> LOCATION '/data/sample'; > OK > > hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE > activity_test; > Copying data from file:/Users/josh/Hive/sample.tab > Loading data to table activity_test > OK > > $ hadoop fs -cat /data/sample/sample.tab > 1227422134|2|1|paid:44519,tax:2120,value:42399 > > hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output2' > SELECT activity_test.occurred_at, activity_test.actor_id, > activity_test.actee_id, activity_test.properties; > Total MapReduce jobs = 1 > Starting Job = job_200811250653_0022, Tracking URL = http://{clipped}: > 50030/jobdetails.jsp?jobid=job_200811250653_0022 > Kill Command = /Users/josh/Hadoop/bin/hadoop job - > Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0022 > map = 0%, reduce =0% > map = 50%, reduce =0% > map = 100%, reduce =0% > Ended Job = job_200811250653_0022 > Moving data to: /data/output2 > OK > > $ hadoop fs -cat /data/output2/* > 012{} > > Still getting incorrect results, is there anything else I could try? > > Josh Ferguson > > On Nov 25, 2008, at 6:34 PM, Ashish Thusoo wrote: > >> Can you try putting the ascii value within quotes, so for example >> FIELDS TERMINATED BY '124' etc... >> >> You can also look at the following file in the source to see an >> example of how this is done >> >> ql/src/test/queries/clientpositive/input_dynamicserde.q >> >> Ashish >> >> -----Original Message----- >> From: Josh Ferguson [mailto:[EMAIL PROTECTED] >> Sent: Tuesday, November 25, 2008 6:18 PM >> To: [email protected] >> Subject: Trouble Loading Into External Table >> >> Ok so I'm trying to create an external table and load a delimited >> file into it, then just do a basic select out of it, here is a >> description of my scenario along with steps and results I took. >> Hopefully someone can help me figure out what I'm doing wrong. >> >> # Sample.tab >> >> 1227422134|2|1|paid:44519,tax:2120,value:42399 >> >> # CREATE TABLE >> >> hive> CREATE EXTERNAL TABLE activity_test >>> (occurred_at INT, actor_id INT, actee_id INT, properties >>> MAP<STRING, STRING>) >>> ROW FORMAT DELIMITED >>> FIELDS TERMINATED BY "|" >>> COLLECTION ITEMS TERMINATED BY "," >>> MAP KEYS TERMINATED BY ":" >>> LOCATION '/data/sample'; >> OK >> >> # LOAD DATA >> >> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE >> activity_test; >> Copying data from file:/Users/josh/Hive/sample.tab Loading data to >> table activity_test OK >> >> # SELECT OVERWRITE DIRECTORY >> >> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output' >> SELECT activity_test.occurred_at, activity_test.actor_id, >> activity_test.actee_id, activity_test.properties; Total MapReduce >> jobs = 1 Starting Job = job_200811250653_0018, Tracking URL = >> http://{clipped}: >> 50030/jobdetails.jsp?jobid=job_200811250653_0018 >> Kill Command = /Users/josh/Hadoop/bin/hadoop job - >> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0018 >> map = 0%, reduce =0% >> map = 50%, reduce =0% >> map = 100%, reduce =0% >> Ended Job = job_200811250653_0018 >> Moving data to: /data/output >> OK >> Time taken: 72.329 seconds >> >> $ hadoop fs -cat /data/output/* >> 012{} >> >> This obviously isn't the correct output, and are just some default >> values for those columns, what am I doing wrong? >> >> Thanks >> >> Josh Ferguson >
