Can you do a describe extended test
after you create it with the first create table statement and send out the output. Ashish ________________________________________ From: Eric Gottschalk [egottsch...@gmail.com] Sent: Wednesday, January 14, 2009 1:38 PM To: hive-user@hadoop.apache.org Subject: Problem with delimiter I am trying to load a text file that uses ascii(254) as a delimiter Column #1 is time Column #2 is ID Contents of this file (test_data.txt): 11-25-2008-14:37:44þ546460819054541434 11-25-2008-15:05:34þ546460554662946399 11-25-2008-15:06:21þ556460862552067034 11-25-2008-15:17:51þ576460863857903582 11-25-2008-14:37:44þ546460819054541434 11-25-2008-15:05:34þ546460554662946399 11-25-2008-15:06:21þ556460862552067034 11-25-2008-15:17:51þ576460863857903582 11-25-2008-14:37:44þ546460819054541434 11-25-2008-15:05:34þ546460554662946399 11-25-2008-15:06:21þ556460862552067034 11-25-2008-15:17:51þ576460863857903582 11-25-2008-14:37:44þ546460819054541434 11-25-2008-15:05:34þ546460554662946399 11-25-2008-15:06:21þ556460862552067034 11-25-2008-15:17:51þ576460863857903582 I create a table in hive with the following syntax: hive> CREATE TABLE test(time STRING, ID string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '254' STORED AS TEXTFILE; OK Time taken: 8.891 seconds After this, I load the file hive> LOAD DATA INPATH 'hdfs://localhost:54310/test_data.txt' OVERWRITE INTO TABLE test; Loading data to table test OK Time taken: 0.623 seconds To make sure that data is loaded correctly, I select only the first column (time) hive> select time from test; Total MapReduce jobs = 1 Starting Job = job_200901141536_0006, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_200901141536_0006 Kill Command = /home/hadoop/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_200901141536_0006 map = 0%, reduce =0% map = 50%, reduce =0% map = 100%, reduce =0% Ended Job = job_200901141536_0006 OK 11-25-2008-14:37:44�546460819054541434 11-25-2008-15:05:34�546460554662946399 11-25-2008-15:06:21�556460862552067034 11-25-2008-15:17:51�576460863857903582 11-25-2008-14:37:44�546460819054541434 11-25-2008-15:05:34�546460554662946399 11-25-2008-15:06:21�556460862552067034 11-25-2008-15:17:51�576460863857903582 11-25-2008-14:37:44�546460819054541434 11-25-2008-15:05:34�546460554662946399 11-25-2008-15:06:21�556460862552067034 11-25-2008-15:17:51�576460863857903582 11-25-2008-14:37:44�546460819054541434 11-25-2008-15:05:34�546460554662946399 11-25-2008-15:06:21�556460862552067034 11-25-2008-15:17:51�576460863857903582 Time taken: 12.25 seconds It is obvious it did not parse the columns correctly. To narrow the problem down, I created a version of this file where the delimiter is a tab Contents of this file (test_data_tab.txt): 11-25-2008-15:05:34 546460554662946399 11-25-2008-15:06:21 556460862552067034 11-25-2008-15:17:51 576460863857903582 11-25-2008-14:37:44 546460819054541434 11-25-2008-15:05:34 546460554662946399 11-25-2008-15:06:21 556460862552067034 11-25-2008-15:17:51 576460863857903582 11-25-2008-14:37:44 546460819054541434 11-25-2008-15:05:34 546460554662946399 11-25-2008-15:06:21 556460862552067034 11-25-2008-15:17:51 576460863857903582 11-25-2008-14:37:44 546460819054541434 11-25-2008-15:05:34 546460554662946399 11-25-2008-15:06:21 556460862552067034 11-25-2008-15:17:51 576460863857903582 I use a nearly identical syntax to create new table. The only change is the new field delimiter '9' (tab) hive> CREATE TABLE test(time STRING, ID string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '9' STORED AS TEXTFILE; OK Time taken: 0.105 seconds Next I load the file hive> LOAD DATA INPATH 'hdfs://localhost:54310/test_data_tab.txt' OVERWRITE INTO TABLE test; Loading data to table test OK Time taken: 0.179 seconds Check to see if the data has been loaded using the same query.. hive> select time from test; Total MapReduce jobs = 1 Starting Job = job_200901141536_0007, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_200901141536_0007 Kill Command = /home/hadoop/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_200901141536_0007 map = 0%, reduce =0% map = 50%, reduce =0% map = 100%, reduce =0% Ended Job = job_200901141536_0007 OK 11-25-2008-15:05:34 11-25-2008-15:06:21 11-25-2008-15:17:51 11-25-2008-14:37:44 11-25-2008-15:05:34 11-25-2008-15:06:21 11-25-2008-15:17:51 11-25-2008-15:05:34 11-25-2008-15:06:21 11-25-2008-15:17:51 11-25-2008-14:37:44 11-25-2008-15:05:34 11-25-2008-15:06:21 11-25-2008-15:17:51 11-25-2008-14:37:44 Time taken: 10.553 seconds Success. The import with ascii(9) worked, but ascii(254) failed. Is there anyway to load this original data file with ascii(254) delimiter?