Re: Trouble Loading Into External Table

Josh Ferguson Wed, 26 Nov 2008 00:35:03 -0800

Ok so what are the default separators for all of the delimitedoptions available?

I already tested just modifying my data by putting ^A instead of "|"and it started working much better. I don't think dynamic serde isworking as advertised because I've tried every combination ofpossible create table statements and I can't get a single one to workexcept the default one.

Thanks guys, I'm super excited about getting to use and hopefullycontribute to hive.


Josh Ferguson

On Nov 25, 2008, at 7:09 PM, Joydeep Sen Sarma wrote:

Can you please send the output of 'describe extendedactivity_test'. This will help us understand what's happening withall the create table parameters.

Also - as a sanity check - can you please check hadoop dfs -cat /data/sample/* (to make sure data got loaded/moved into that dir)


-----Original Message-----
From: Josh Ferguson [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2008 7:03 PM
To: [email protected]
Subject: Re: Trouble Loading Into External Table

hive> CREATE EXTERNAL TABLE activity_test

(occurred_at INT, actor_id INT, actee_id INT, properties

MAP<STRING, STRING>)

ROW FORMAT DELIMITED
FIELDS TERMINATED BY '124'
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
LINES TERMINATED BY '10'
STORED AS TEXTFILE
LOCATION '/data/sample';

OK

hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
activity_test;
Copying data from file:/Users/josh/Hive/sample.tab
Loading data to table activity_test
OK

$ hadoop fs -cat /data/sample/sample.tab
1227422134|2|1|paid:44519,tax:2120,value:42399

hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output2'
SELECT activity_test.occurred_at, activity_test.actor_id,
activity_test.actee_id, activity_test.properties;
Total MapReduce jobs = 1
Starting Job = job_200811250653_0022, Tracking URL = http://{clipped}:
50030/jobdetails.jsp?jobid=job_200811250653_0022
Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0022
  map = 0%,  reduce =0%
  map = 50%,  reduce =0%
  map = 100%,  reduce =0%
Ended Job = job_200811250653_0022
Moving data to: /data/output2
OK

$ hadoop fs -cat /data/output2/*
012{}

Still getting incorrect results, is there anything else I could try?

Josh Ferguson

On Nov 25, 2008, at 6:34 PM, Ashish Thusoo wrote:

Can you try putting the ascii value within quotes, so for example
FIELDS TERMINATED BY '124' etc...

You can also look at the following file in the source to see an
example of how this is done

ql/src/test/queries/clientpositive/input_dynamicserde.q

Ashish

-----Original Message-----
From: Josh Ferguson [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2008 6:18 PM
To: [email protected]
Subject: Trouble Loading Into External Table

Ok so I'm trying to create an external table and load a delimited
file into it, then just do a basic select out of it, here is a
description of my scenario along with steps and results I took.
Hopefully someone can help me figure out what I'm doing wrong.

# Sample.tab

1227422134|2|1|paid:44519,tax:2120,value:42399

# CREATE TABLE

hive> CREATE EXTERNAL TABLE activity_test

(occurred_at INT, actor_id INT, actee_id INT, properties
MAP<STRING, STRING>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY "|"
COLLECTION ITEMS TERMINATED BY ","
MAP KEYS TERMINATED BY ":"
LOCATION '/data/sample';

OK

# LOAD DATA

hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
activity_test;
Copying data from file:/Users/josh/Hive/sample.tab Loading data to
table activity_test OK

# SELECT OVERWRITE DIRECTORY

hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output'
SELECT activity_test.occurred_at, activity_test.actor_id,
activity_test.actee_id, activity_test.properties; Total MapReduce
jobs = 1 Starting Job = job_200811250653_0018, Tracking URL =
http://{clipped}:
50030/jobdetails.jsp?jobid=job_200811250653_0018
Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0018
  map = 0%,  reduce =0%
  map = 50%,  reduce =0%
  map = 100%,  reduce =0%
Ended Job = job_200811250653_0018
Moving data to: /data/output
OK
Time taken: 72.329 seconds

$ hadoop fs -cat /data/output/*
012{}

This obviously isn't the correct output, and are just some default
values for those columns, what am I doing wrong?

Thanks

Josh Ferguson

Re: Trouble Loading Into External Table

Reply via email to