Hi, Thanks for letting me know about the Jira ticket. yes it would be necessary to have those partition as part of schema to group them by.
J On 1 Dec 2010, at 16:33, Gerrit Jansen van Vuuren wrote: > Hi, > > > You'll have to tell pig in the AS statement what the schema is: > e.g. I = LOAD '$INPUT' using AllLoader() AS ( valueTime:int, userid:long, > page_url:chararray, referrer_url:chararray, ip:chararray, country:chararray > ); > > The only problem with the AllLoader currently (until the jira I sent earlier > is fixed) is that the partition keys won't be in the schema itself, but you > can still filter by partition using the all loader constructor: for example > AllLoader("date>='2010-11-01'") > > > Cheers, > Gerrit > > > viewTime INT, userid BIGINT, >> page_url STRING, referrer_url STRING, >> ip STRING COMMENT 'IP Address of the User', >> country STRING COMMENT 'country of originate > > -----Original Message----- > From: Jae Lee [mailto:jae....@forward.co.uk] > Sent: Wednesday, December 01, 2010 4:24 PM > To: dev@pig.apache.org > Subject: Re: has anyone tried using HiveColumnarLoader over TextFile > fileformat? > > Thanks Gerrit, > > yeah it seems to work as in it loads up the files properly... > > however it fails to understand schema and there's no way to specify the > underlying schema.... > > Would you have any recommendation to get the schema right? > > J > > On 1 Dec 2010, at 15:48, Gerrit Jansen van Vuuren wrote: > >> Hi, >> >> >> >> Short answer is yes. As long as the partition keys are reflected in the >> folder path itself AllLoader will pick it up. >> >> Partition keys in hive are (normally from my understanding) reflected in > the >> file path itself so that if you have >> partitions: type, date >> The table path will actually be >> $HIVE_ROOT/warehouse/mytable/type=[value]/date=[value] >> >> The AllLoader does understand this type of partitioning. So that if you >> point it to load $HIVE_ROOT/warehouse/mytable >> It will allow you to use the type and date columns to filte (note that you >> can only specify the filtering in the AllLoader() part see: >> https://issues.apache.org/jira/browse/PIG-1717 ) >> >> The partitioning is detected by the AllLoader (and HiveColumnarLoader) by >> looking at the actual folders in the path, and reading all key=value >> patterns in the path name itself, registering these internally as > partition >> keys. >> >> >> -----Original Message----- >> From: Jae Lee [mailto:jae....@forward.co.uk] >> Sent: Wednesday, December 01, 2010 2:03 PM >> To: dev@pig.apache.org >> Subject: Re: has anyone tried using HiveColumnarLoader over TextFile >> fileformat? >> >> Hi Gerrit, >> >> Yeah Hive table isn't stored as RCFILE but TEXTFILE >> >> so our table creation ddl looks like below >> >> CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT, >> page_url STRING, referrer_url STRING, >> ip STRING COMMENT 'IP Address of the User', >> country STRING COMMENT 'country of origination') >> COMMENT 'This is the staging page view table' >> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' >> STORED AS TEXTFILE >> >> Does AllLoader understand notion of partition keys? as HiveColumnarLoader? >> >> J >> >> On 1 Dec 2010, at 13:48, Gerrit Jansen van Vuuren wrote: >> >>> Hi, >>> >>> The HiveColumnarLoader can only read files written by hive or the hive >>> API(s), and has its own InputFormat returning the HiveRCRecordReader. >>> >>> Are you trying to read a plain text format? >>> Under the hood the HiveRCRecordReader uses the hive specific rc reader to >>> read the input file and throws an error either if the file is not hive rc >> or >>> is a corrupt hiverc. >>> >>> >>> If what you want is a Loader that loads all types of files, have a look > at >>> the AllLoader (latest piggybank trunk). It uses configuration that you > set >>> in the pig.properties to decide on the fly what loader to use for what >> files >>> (does extension, content and path matching), it also has the hive style >> path >>> partitioning for dates etc. Using this loader you can point it at a >> directoy >>> with lzo, gz, bz2 hiverc etc files in it and if you setup the loaders >>> correctly it will load each file with its preconfigured loader. >>> The javadoc in the class explains how to configure it. >>> >>> Cheers, >>> Gerrit >>> >>> -----Original Message----- >>> From: Jae Lee [mailto:jae....@forward.co.uk] >>> Sent: Wednesday, December 01, 2010 12:33 PM >>> To: dev@pig.apache.org >>> Subject: has anyone tried using HiveColumnarLoader over TextFile >> fileformat? >>> >>> Hi everyone. >>> >>> I've tried using HiveColumnarLoader and getting java.io.IOException: >>> hdfs://file_path not a RCFile >>> >>> I've noticed HiveColumnarLoader is expecting HiveRCRecordReader from >>> prepareToRead method.. >>> >>> Could you guys give any guidance how possible it is to modify >>> HiveRCRecordReader to support any RecordReader? >>> >>> J >>> >>> >> >> >> > > >