Ok so what are the default separators for all of the delimited
options available?
I already tested just modifying my data by putting ^A instead of "|"
and it started working much better. I don't think dynamic serde is
working as advertised because I've tried every combination of
possible create table statements and I can't get a single one to work
except the default one.
Thanks guys, I'm super excited about getting to use and hopefully
contribute to hive.
Josh Ferguson
On Nov 25, 2008, at 7:09 PM, Joydeep Sen Sarma wrote:
Can you please send the output of 'describe extended
activity_test'. This will help us understand what's happening with
all the create table parameters.
Also - as a sanity check - can you please check hadoop dfs -cat /
data/sample/* (to make sure data got loaded/moved into that dir)
-----Original Message-----
From: Josh Ferguson [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2008 7:03 PM
To: [email protected]
Subject: Re: Trouble Loading Into External Table
hive> CREATE EXTERNAL TABLE activity_test
(occurred_at INT, actor_id INT, actee_id INT, properties
MAP<STRING, STRING>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '124'
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
LINES TERMINATED BY '10'
STORED AS TEXTFILE
LOCATION '/data/sample';
OK
hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
activity_test;
Copying data from file:/Users/josh/Hive/sample.tab
Loading data to table activity_test
OK
$ hadoop fs -cat /data/sample/sample.tab
1227422134|2|1|paid:44519,tax:2120,value:42399
hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output2'
SELECT activity_test.occurred_at, activity_test.actor_id,
activity_test.actee_id, activity_test.properties;
Total MapReduce jobs = 1
Starting Job = job_200811250653_0022, Tracking URL = http://{clipped}:
50030/jobdetails.jsp?jobid=job_200811250653_0022
Kill Command = /Users/josh/Hadoop/bin/hadoop job -
Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0022
map = 0%, reduce =0%
map = 50%, reduce =0%
map = 100%, reduce =0%
Ended Job = job_200811250653_0022
Moving data to: /data/output2
OK
$ hadoop fs -cat /data/output2/*
012{}
Still getting incorrect results, is there anything else I could try?
Josh Ferguson
On Nov 25, 2008, at 6:34 PM, Ashish Thusoo wrote:
Can you try putting the ascii value within quotes, so for example
FIELDS TERMINATED BY '124' etc...
You can also look at the following file in the source to see an
example of how this is done
ql/src/test/queries/clientpositive/input_dynamicserde.q
Ashish
-----Original Message-----
From: Josh Ferguson [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2008 6:18 PM
To: [email protected]
Subject: Trouble Loading Into External Table
Ok so I'm trying to create an external table and load a delimited
file into it, then just do a basic select out of it, here is a
description of my scenario along with steps and results I took.
Hopefully someone can help me figure out what I'm doing wrong.
# Sample.tab
1227422134|2|1|paid:44519,tax:2120,value:42399
# CREATE TABLE
hive> CREATE EXTERNAL TABLE activity_test
(occurred_at INT, actor_id INT, actee_id INT, properties
MAP<STRING, STRING>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY "|"
COLLECTION ITEMS TERMINATED BY ","
MAP KEYS TERMINATED BY ":"
LOCATION '/data/sample';
OK
# LOAD DATA
hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO TABLE
activity_test;
Copying data from file:/Users/josh/Hive/sample.tab Loading data to
table activity_test OK
# SELECT OVERWRITE DIRECTORY
hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output'
SELECT activity_test.occurred_at, activity_test.actor_id,
activity_test.actee_id, activity_test.properties; Total MapReduce
jobs = 1 Starting Job = job_200811250653_0018, Tracking URL =
http://{clipped}:
50030/jobdetails.jsp?jobid=job_200811250653_0018
Kill Command = /Users/josh/Hadoop/bin/hadoop job -
Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0018
map = 0%, reduce =0%
map = 50%, reduce =0%
map = 100%, reduce =0%
Ended Job = job_200811250653_0018
Moving data to: /data/output
OK
Time taken: 72.329 seconds
$ hadoop fs -cat /data/output/*
012{}
This obviously isn't the correct output, and are just some default
values for those columns, what am I doing wrong?
Thanks
Josh Ferguson