Re: Trouble Loading Into External Table

Josh Ferguson Wed, 26 Nov 2008 13:48:49 -0800

Well in theory yes, I didn't test that. It could be an interactionbetween having a default and not having a default, or an interactionwith which non-default value you choose. Making a create table thatuses a field seperator of 1 would allow you to check that.


Josh


On Nov 26, 2008, at 1:19 PM, Pete Wyckoff wrote:

So, the data then was not field delimited by ‘124’ and must bedelimited by ‘1’ ?

So, if you had specified ‘1’ as the field delim instead of ‘124’ inthe initial create, it should have worked then??



Thanks, pete


On 11/26/08 12:33 PM, "Josh Ferguson" <[EMAIL PROTECTED]> wrote:

This is the describe extended for the properly working table

hive> DESCRIBE EXTENDED basic;
OK
actor_id        int
actee_id        int
properties      map<string,string>
Detailed Table Information:

Table(tableName:basic,dbName:default,owner:josh,createTime:1227688761,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:actor_id,type:int,comment:null), FieldSchema(name:actee_id,type:int,comment:null), FieldSchema(name:properties,type:map<string,string>,comment:null)],location:/data/sample2,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:-1,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[],sortCols:[],parameters:{}),partitionKeys:[],parameters:{EXTERNAL=TRUE})


Josh Ferguson

On Nov 26, 2008, at 11:25 AM, Pete Wyckoff wrote:


 Can you send the new output of describe extended?

 Thanks, pete


 On 11/26/08 6:46 AM, "Ashish Thusoo" <[EMAIL PROTECTED]> wrote:


congrats!! :)

please do file a JIRA for this. We will fix this as soon aspossible..


 Ashish

 ________________________________________
 From: Josh Ferguson [EMAIL PROTECTED]
 Sent: Wednesday, November 26, 2008 12:44 AM
 To: [email protected]
 Subject: Re: Trouble Loading Into External Table

 I got it! The combination I got to work was the following:

 CREATE EXTERNAL TABLE basic
 (actor_id INT, actee_id INT, properties MAP<STRING, STRING>)
 ROW FORMAT DELIMITED
 COLLECTION ITEMS TERMINATED BY '44'
 MAP KEYS TERMINATED BY '58'
 STORED AS TEXTFILE
 LOCATION '/data/sample2';

 This forced a DynamicSerde/TCTLSeperatedProtocol combo which didn't
 work properly when I specified the field delimiter but works fine
 when you only specify the other two for some reason and leave the
 field delimiter as the default, I should probably file it in JIRA.

 Josh Ferguson

 On Nov 25, 2008, at 7:09 PM, Joydeep Sen Sarma wrote:

 >
 > Can you please send the output of 'describe extended
 > activity_test'. This will help us understand what's happening with
 > all the create table parameters.
 >
 > Also - as a sanity check - can you please check hadoop dfs -cat  /
 > data/sample/* (to make sure data got loaded/moved into that dir)
 >
 > -----Original Message-----
 > From: Josh Ferguson [mailto:[EMAIL PROTECTED]
 > Sent: Tuesday, November 25, 2008 7:03 PM
 > To: [email protected]
 > Subject: Re: Trouble Loading Into External Table
 >
 > hive> CREATE EXTERNAL TABLE activity_test
 >> (occurred_at INT, actor_id INT, actee_id INT, properties
 > MAP<STRING, STRING>)
 >> ROW FORMAT DELIMITED
 >> FIELDS TERMINATED BY '124'
 >> COLLECTION ITEMS TERMINATED BY '44'
 >> MAP KEYS TERMINATED BY '58'
 >> LINES TERMINATED BY '10'
 >> STORED AS TEXTFILE
 >> LOCATION '/data/sample';
 > OK
 >

> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTOTABLE

 > activity_test;
 > Copying data from file:/Users/josh/Hive/sample.tab
 > Loading data to table activity_test
 > OK
 >
 > $ hadoop fs -cat /data/sample/sample.tab
 > 1227422134|2|1|paid:44519,tax:2120,value:42399
 >
 > hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output2'
 > SELECT activity_test.occurred_at, activity_test.actor_id,
 > activity_test.actee_id, activity_test.properties;
 > Total MapReduce jobs = 1

> Starting Job = job_200811250653_0022, Tracking URL = http://{clipped}:

 > 50030/jobdetails.jsp?jobid=job_200811250653_0022
 > Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
 > Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0022
 >   map = 0%,  reduce =0%
 >   map = 50%,  reduce =0%
 >   map = 100%,  reduce =0%
 > Ended Job = job_200811250653_0022
 > Moving data to: /data/output2
 > OK
 >
 > $ hadoop fs -cat /data/output2/*
 > 012{}
 >

> Still getting incorrect results, is there anything else I couldtry?

 >
 > Josh Ferguson
 >
 > On Nov 25, 2008, at 6:34 PM, Ashish Thusoo wrote:
 >
 >> Can you try putting the ascii value within quotes, so for example
 >> FIELDS TERMINATED BY '124' etc...
 >>
 >> You can also look at the following file in the source to see an
 >> example of how this is done
 >>
 >> ql/src/test/queries/clientpositive/input_dynamicserde.q
 >>
 >> Ashish
 >>
 >> -----Original Message-----
 >> From: Josh Ferguson [mailto:[EMAIL PROTECTED]
 >> Sent: Tuesday, November 25, 2008 6:18 PM
 >> To: [email protected]
 >> Subject: Trouble Loading Into External Table
 >>
 >> Ok so I'm trying to create an external table and load a delimited
 >> file into it, then just do a basic select out of it, here is a
 >> description of my scenario along with steps and results I took.
 >> Hopefully someone can help me figure out what I'm doing wrong.
 >>
 >> # Sample.tab
 >>
 >> 1227422134|2|1|paid:44519,tax:2120,value:42399
 >>
 >> # CREATE TABLE
 >>
 >> hive> CREATE EXTERNAL TABLE activity_test
 >>> (occurred_at INT, actor_id INT, actee_id INT, properties
 >>> MAP<STRING, STRING>)
 >>> ROW FORMAT DELIMITED
 >>> FIELDS TERMINATED BY "|"
 >>> COLLECTION ITEMS TERMINATED BY ","
 >>> MAP KEYS TERMINATED BY ":"
 >>> LOCATION '/data/sample';
 >> OK
 >>
 >> # LOAD DATA
 >>

>> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTOTABLE

 >> activity_test;
 >> Copying data from file:/Users/josh/Hive/sample.tab Loading data to
 >> table activity_test OK
 >>
 >> # SELECT OVERWRITE DIRECTORY
 >>
 >> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output'
 >> SELECT activity_test.occurred_at, activity_test.actor_id,
 >> activity_test.actee_id, activity_test.properties; Total MapReduce
 >> jobs = 1 Starting Job = job_200811250653_0018, Tracking URL =
 >> http://{clipped}:
 >> 50030/jobdetails.jsp?jobid=job_200811250653_0018
 >> Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
 >> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0018
 >>   map = 0%,  reduce =0%
 >>   map = 50%,  reduce =0%
 >>   map = 100%,  reduce =0%
 >> Ended Job = job_200811250653_0018
 >> Moving data to: /data/output
 >> OK
 >> Time taken: 72.329 seconds
 >>
 >> $ hadoop fs -cat /data/output/*
 >> 012{}
 >>
 >> This obviously isn't the correct output, and are just some default
 >> values for those columns, what am I doing wrong?
 >>
 >> Thanks
 >>
 >> Josh Ferguson
 >

Re: Trouble Loading Into External Table

Reply via email to