Hi Eric, chars > 127 are probably NOT supported since we assume the text is in UTF-8. You might still want to try "-2" in case that works.
I don't know what encoding will support characters from 0 to 255. Zheng On Wed, Jan 14, 2009 at 3:50 PM, Eric Gottschalk <egottsch...@gmail.com>wrote: > The extended table description for table #1: > > hive> CREATE TABLE test(time STRING, ID string) ROW FORMAT DELIMITED FIELDS > TERMINATED BY '254' STORED AS TEXTFILE; > OK > Time taken: 0.089 seconds > hive> describe extended test; > OK > time string > id string > Detailed Table Information: > Table(tableName:test,dbName:default,owner:hadoop,createTime:1231969614,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:time,type:string,comment:null), > FieldSchema(name:id,type:string,comment:null)],location:hdfs://localhost:54310/test,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:-1,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,parameters:{serialization.format=254,field.delim=254}),bucketCols:[],sortCols:[],parameters:{}),partitionKeys:[],parameters:{}) > > Time taken: 0.105 seconds > > For good measure, the table with the tab: > > hive> CREATE TABLE test(time STRING, ID string) ROW FORMAT DELIMITED FIELDS > TERMINATED BY '9' STORED AS TEXTFILE; > OK > Time taken: 0.124 seconds > hive> describe extended test; > OK > time string > id string > Detailed Table Information: > Table(tableName:test,dbName:default,owner:hadoop,createTime:1231969673,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:time,type:string,comment:null), > FieldSchema(name:id,type:string,comment:null)],location:hdfs://localhost:54310/test,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:-1,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,parameters:{serialization.format=9,field.delim=9}),bucketCols:[],sortCols:[],parameters:{}),partitionKeys:[],parameters:{}) > Time taken: 0.087 seconds > > > > On Jan 14, 2009, at 1:43 PM, Ashish Thusoo wrote: > > Can you do a >> >> describe extended test >> >> after you create it with the first create table statement and send out the >> output. >> >> Ashish >> ________________________________________ >> From: Eric Gottschalk [egottsch...@gmail.com] >> Sent: Wednesday, January 14, 2009 1:38 PM >> To: hive-user@hadoop.apache.org >> Subject: Problem with delimiter >> >> I am trying to load a text file that uses ascii(254) as a delimiter >> Column #1 is time >> Column #2 is ID >> >> Contents of this file (test_data.txt): >> 11-25-2008-14:37:44þ546460819054541434 >> 11-25-2008-15:05:34þ546460554662946399 >> 11-25-2008-15:06:21þ556460862552067034 >> 11-25-2008-15:17:51þ576460863857903582 >> 11-25-2008-14:37:44þ546460819054541434 >> 11-25-2008-15:05:34þ546460554662946399 >> 11-25-2008-15:06:21þ556460862552067034 >> 11-25-2008-15:17:51þ576460863857903582 >> 11-25-2008-14:37:44þ546460819054541434 >> 11-25-2008-15:05:34þ546460554662946399 >> 11-25-2008-15:06:21þ556460862552067034 >> 11-25-2008-15:17:51þ576460863857903582 >> 11-25-2008-14:37:44þ546460819054541434 >> 11-25-2008-15:05:34þ546460554662946399 >> 11-25-2008-15:06:21þ556460862552067034 >> 11-25-2008-15:17:51þ576460863857903582 >> >> I create a table in hive with the following syntax: >> >> hive> CREATE TABLE test(time STRING, ID string) ROW FORMAT DELIMITED >> FIELDS TERMINATED BY '254' STORED AS TEXTFILE; >> OK >> Time taken: 8.891 seconds >> >> After this, I load the file >> >> hive> LOAD DATA INPATH 'hdfs://localhost:54310/test_data.txt' OVERWRITE >> INTO TABLE test; >> Loading data to table test >> OK >> Time taken: 0.623 seconds >> >> To make sure that data is loaded correctly, I select only the first column >> (time) >> >> hive> select time from test; >> Total MapReduce jobs = 1 >> Starting Job = job_200901141536_0006, Tracking URL = >> http://localhost:50030/jobdetails.jsp?jobid=job_200901141536_0006 >> Kill Command = /home/hadoop/hadoop/bin/../bin/hadoop job >> -Dmapred.job.tracker=localhost:54311 -kill job_200901141536_0006 >> map = 0%, reduce =0% >> map = 50%, reduce =0% >> map = 100%, reduce =0% >> Ended Job = job_200901141536_0006 >> OK >> 11-25-2008-14:37:44�546460819054541434 >> 11-25-2008-15:05:34�546460554662946399 >> 11-25-2008-15:06:21�556460862552067034 >> 11-25-2008-15:17:51�576460863857903582 >> 11-25-2008-14:37:44�546460819054541434 >> 11-25-2008-15:05:34�546460554662946399 >> 11-25-2008-15:06:21�556460862552067034 >> 11-25-2008-15:17:51�576460863857903582 >> 11-25-2008-14:37:44�546460819054541434 >> 11-25-2008-15:05:34�546460554662946399 >> 11-25-2008-15:06:21�556460862552067034 >> 11-25-2008-15:17:51�576460863857903582 >> 11-25-2008-14:37:44�546460819054541434 >> 11-25-2008-15:05:34�546460554662946399 >> 11-25-2008-15:06:21�556460862552067034 >> 11-25-2008-15:17:51�576460863857903582 >> Time taken: 12.25 seconds >> >> It is obvious it did not parse the columns correctly. >> >> To narrow the problem down, I created a version of this file where the >> delimiter is a tab >> >> Contents of this file (test_data_tab.txt): >> 11-25-2008-15:05:34 546460554662946399 >> 11-25-2008-15:06:21 556460862552067034 >> 11-25-2008-15:17:51 576460863857903582 >> 11-25-2008-14:37:44 546460819054541434 >> 11-25-2008-15:05:34 546460554662946399 >> 11-25-2008-15:06:21 556460862552067034 >> 11-25-2008-15:17:51 576460863857903582 >> 11-25-2008-14:37:44 546460819054541434 >> 11-25-2008-15:05:34 546460554662946399 >> 11-25-2008-15:06:21 556460862552067034 >> 11-25-2008-15:17:51 576460863857903582 >> 11-25-2008-14:37:44 546460819054541434 >> 11-25-2008-15:05:34 546460554662946399 >> 11-25-2008-15:06:21 556460862552067034 >> 11-25-2008-15:17:51 576460863857903582 >> >> I use a nearly identical syntax to create new table. The only change is >> the new field delimiter '9' (tab) >> >> hive> CREATE TABLE test(time STRING, ID string) ROW FORMAT DELIMITED >> FIELDS TERMINATED BY '9' STORED AS TEXTFILE; >> OK >> Time taken: 0.105 seconds >> >> Next I load the file >> >> hive> LOAD DATA INPATH 'hdfs://localhost:54310/test_data_tab.txt' >> OVERWRITE INTO TABLE test; >> Loading data to table test >> OK >> Time taken: 0.179 seconds >> >> Check to see if the data has been loaded using the same query.. >> >> hive> select time from test; >> Total MapReduce jobs = 1 >> Starting Job = job_200901141536_0007, Tracking URL = >> http://localhost:50030/jobdetails.jsp?jobid=job_200901141536_0007 >> Kill Command = /home/hadoop/hadoop/bin/../bin/hadoop job >> -Dmapred.job.tracker=localhost:54311 -kill job_200901141536_0007 >> map = 0%, reduce =0% >> map = 50%, reduce =0% >> map = 100%, reduce =0% >> Ended Job = job_200901141536_0007 >> OK >> 11-25-2008-15:05:34 >> 11-25-2008-15:06:21 >> 11-25-2008-15:17:51 >> 11-25-2008-14:37:44 >> 11-25-2008-15:05:34 >> 11-25-2008-15:06:21 >> 11-25-2008-15:17:51 >> 11-25-2008-15:05:34 >> 11-25-2008-15:06:21 >> 11-25-2008-15:17:51 >> 11-25-2008-14:37:44 >> 11-25-2008-15:05:34 >> 11-25-2008-15:06:21 >> 11-25-2008-15:17:51 >> 11-25-2008-14:37:44 >> Time taken: 10.553 seconds >> >> Success. The import with ascii(9) worked, but ascii(254) failed. Is there >> anyway to load this original data file with ascii(254) delimiter? >> >> >> > -- Yours, Zheng