Re: has anyone tried using HiveColumnarLoader over TextFile fileformat?

Jae Lee Wed, 01 Dec 2010 08:34:13 -0800

also it doesn't seem to include the partitioning field as part of returned 
tuple either...


J

On 1 Dec 2010, at 16:24, Jae Lee wrote:

> Thanks Gerrit,
> 
> yeah it seems to work as in it loads up the files properly...
> 
> however it fails to understand schema and there's no way to specify the 
> underlying schema....
> 
> Would you have any recommendation to get the schema right?
> 
> J
> 
> On 1 Dec 2010, at 15:48, Gerrit Jansen van Vuuren wrote:
> 
>> Hi,
>> 
>> 
>> 
>> Short answer is yes. As long as the partition keys are reflected in the
>> folder path itself AllLoader will pick it up.
>> 
>> Partition keys in hive are (normally from my understanding) reflected in the
>> file path itself so that if you have 
>> partitions: type, date
>> The table path will actually be
>> $HIVE_ROOT/warehouse/mytable/type=[value]/date=[value]
>> 
>> The AllLoader does understand this type of partitioning. So that if you
>> point it to load $HIVE_ROOT/warehouse/mytable
>> It will allow you to use the type and date columns to filte (note that you
>> can only specify the filtering in the AllLoader() part  see:
>> https://issues.apache.org/jira/browse/PIG-1717 )
>> 
>> The partitioning is detected by the AllLoader (and HiveColumnarLoader) by
>> looking at the actual folders in the path, and reading all key=value
>> patterns in the path name itself, registering these internally as partition
>> keys.
>> 
>> 
>> -----Original Message-----
>> From: Jae Lee [mailto:[email protected]] 
>> Sent: Wednesday, December 01, 2010 2:03 PM
>> To: [email protected]
>> Subject: Re: has anyone tried using HiveColumnarLoader over TextFile
>> fileformat?
>> 
>> Hi Gerrit,
>> 
>> Yeah Hive table isn't stored as RCFILE but TEXTFILE
>> 
>> so our table creation ddl looks like below
>> 
>> CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT,
>>    page_url STRING, referrer_url STRING,
>>    ip STRING COMMENT 'IP Address of the User',
>>    country STRING COMMENT 'country of origination')
>> COMMENT 'This is the staging page view table'
>> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
>> STORED AS TEXTFILE
>> 
>> Does AllLoader understand notion of partition keys? as HiveColumnarLoader?
>> 
>> J
>> 
>> On 1 Dec 2010, at 13:48, Gerrit Jansen van Vuuren wrote:
>> 
>>> Hi,
>>> 
>>> The HiveColumnarLoader can only read files written by hive or the hive
>>> API(s), and has its own InputFormat returning the HiveRCRecordReader.
>>> 
>>> Are you trying to read a plain text format? 
>>> Under the hood the HiveRCRecordReader uses the hive specific rc reader to
>>> read the input file and throws an error either if the file is not hive rc
>> or
>>> is a corrupt hiverc.
>>> 
>>> 
>>> If what you want is a Loader that loads all types of files, have a look at
>>> the AllLoader (latest piggybank trunk). It uses configuration that you set
>>> in the pig.properties to decide on the fly what loader to use for what
>> files
>>> (does extension, content and path matching), it also has the hive style
>> path
>>> partitioning for dates etc. Using this loader you can point it at a
>> directoy
>>> with lzo, gz, bz2 hiverc etc files in it and if you setup the loaders
>>> correctly it will load each file with its preconfigured loader.
>>> The javadoc in the class explains how to configure it.
>>> 
>>> Cheers,
>>> Gerrit
>>> 
>>> -----Original Message-----
>>> From: Jae Lee [mailto:[email protected]] 
>>> Sent: Wednesday, December 01, 2010 12:33 PM
>>> To: [email protected]
>>> Subject: has anyone tried using HiveColumnarLoader over TextFile
>> fileformat?
>>> 
>>> Hi everyone.
>>> 
>>> I've tried using HiveColumnarLoader and getting java.io.IOException:
>>> hdfs://file_path not a RCFile
>>> 
>>> I've noticed HiveColumnarLoader is expecting HiveRCRecordReader from
>>> prepareToRead method..
>>> 
>>> Could you guys give any guidance how possible it is to modify
>>> HiveRCRecordReader to support any RecordReader?
>>> 
>>> J
>>> 
>>> 
>> 
>> 
>> 
> 
>

Re: has anyone tried using HiveColumnarLoader over TextFile fileformat?

Reply via email to