Hi,
Short answer is yes. As long as the partition keys are reflected in the folder path itself AllLoader will pick it up. Partition keys in hive are (normally from my understanding) reflected in the file path itself so that if you have partitions: type, date The table path will actually be $HIVE_ROOT/warehouse/mytable/type=[value]/date=[value] The AllLoader does understand this type of partitioning. So that if you point it to load $HIVE_ROOT/warehouse/mytable It will allow you to use the type and date columns to filte (note that you can only specify the filtering in the AllLoader() part see: https://issues.apache.org/jira/browse/PIG-1717 ) The partitioning is detected by the AllLoader (and HiveColumnarLoader) by looking at the actual folders in the path, and reading all key=value patterns in the path name itself, registering these internally as partition keys. -----Original Message----- From: Jae Lee [mailto:jae....@forward.co.uk] Sent: Wednesday, December 01, 2010 2:03 PM To: dev@pig.apache.org Subject: Re: has anyone tried using HiveColumnarLoader over TextFile fileformat? Hi Gerrit, Yeah Hive table isn't stored as RCFILE but TEXTFILE so our table creation ddl looks like below CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING COMMENT 'IP Address of the User', country STRING COMMENT 'country of origination') COMMENT 'This is the staging page view table' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE Does AllLoader understand notion of partition keys? as HiveColumnarLoader? J On 1 Dec 2010, at 13:48, Gerrit Jansen van Vuuren wrote: > Hi, > > The HiveColumnarLoader can only read files written by hive or the hive > API(s), and has its own InputFormat returning the HiveRCRecordReader. > > Are you trying to read a plain text format? > Under the hood the HiveRCRecordReader uses the hive specific rc reader to > read the input file and throws an error either if the file is not hive rc or > is a corrupt hiverc. > > > If what you want is a Loader that loads all types of files, have a look at > the AllLoader (latest piggybank trunk). It uses configuration that you set > in the pig.properties to decide on the fly what loader to use for what files > (does extension, content and path matching), it also has the hive style path > partitioning for dates etc. Using this loader you can point it at a directoy > with lzo, gz, bz2 hiverc etc files in it and if you setup the loaders > correctly it will load each file with its preconfigured loader. > The javadoc in the class explains how to configure it. > > Cheers, > Gerrit > > -----Original Message----- > From: Jae Lee [mailto:jae....@forward.co.uk] > Sent: Wednesday, December 01, 2010 12:33 PM > To: dev@pig.apache.org > Subject: has anyone tried using HiveColumnarLoader over TextFile fileformat? > > Hi everyone. > > I've tried using HiveColumnarLoader and getting java.io.IOException: > hdfs://file_path not a RCFile > > I've noticed HiveColumnarLoader is expecting HiveRCRecordReader from > prepareToRead method.. > > Could you guys give any guidance how possible it is to modify > HiveRCRecordReader to support any RecordReader? > > J > >