> On March 1, 2013, 2:54 p.m., Jonathan Coveney wrote: > > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java, > > line 157 > > <https://reviews.apache.org/r/9685/diff/1/?file=263710#file263710line157> > > > > In the case where they do not give it a schema, I think that they > > shouldn't have to actually specify that it is a map. It's implicit. > > Furthermore, I think that instead of passing the Schema as an argument, if > > they do an "as (schema)" THEN it should attempt to return a Schema in > > accordance with what you provide. > > > > Open to argument, though. > > Taku Miyakawa wrote: > > In the case where they do not give it a schema, I think that they > shouldn't have to actually specify that it is a map. > > Yes ":map[]" is redundant. I will remove it. > > > if they do an "as (schema)" THEN it should attempt to return a Schema > in accordance with what you provide. > > Does this code meet your point? > > <code> > access = LOAD 'access.log' USING > org.apache.pig.piggybank.storage.LTSVLoader() AS (host:chararray, > ua:chararray); > ua = FOREACH access GENERATE ua; > dump ua; > </code> > > It will be good, but I could not find a way to implement the function in > that manner. LoadFunc seems to have no method which can get an input schema > specified by users. > > Cheolsoo Park wrote: > >> It will be good, but I could not find a way to implement the function > in that manner. LoadFunc seems to have no method which can get an input > schema specified by users. > > You don't need access to the schema given by the AS clause inside > LoadFunc. You can just make LoadMetaData.getSchema() return null and load > every field as bytearray in LoadFunc.getNext(). Then, Pig inserts a FOREACH > operator after LOAD and does type conversion using the schema given by the AS > clause. > > This is what happens when PigStorage is not given a JSON schema file, but > the output schema is defined in the AS clause. > > Jonathan Coveney wrote: > Cheolsoo is totally correct. That said, this should probably be > documented better. Also, the fact that we rely on an implicit foreach is so > gross... but this is how you do it! > > Taku Miyakawa wrote: > I think the function is similar to JsonLoader, it also takes a schema as > an argument. > > An LTSV file itself contains field names in each row. A column consists > of a name and a value (<name>:<value>). And users may want to select fields > by their names. That is why the loader function should know field names, and > it takes a schema as an argument. > > Let me explain in detail. > > In an LTSV file, orders of columns may differ for each row (although it > is not a common case). And a row may lack some columns. In this example, > the second row lacks "ua" column, and the third row has a different order of > columns, but it's OK. > > {data} > host:pc.example.com req:GET /index.html ua:Opera/9.80 > host:user.example.net req:GET /favicon.ico > ua:Mozilla/5.0 req:GET /news.html host:workstation.example.org > {/data} > > To load "host" column and "ua" column into Pig fields, the loader > function should know their names, because it cannot depend on positions of > columns. > > {code} > access = LOAD 'access.log' USING LTSVLoader('host, ua'); > DUMP access; > > -- Output: > -- (pc.example.com, Opera/9.80) > -- (user.example.net,) > -- (workstation.example.org, Mozilla/5.0) > {/code} >
Hmm, ask on the dev@ mailing list about this. I understand your reasoning and it is reasonable, but that said, this syntax irks me. I think there has to be a way to do this, but I am not sure what it is. If there isn't let's add one :) But there should be. Once we nail that, I will give it another go but it looks close to a +1. - Jonathan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9685/#review17239 ----------------------------------------------------------- On March 9, 2013, 8:18 a.m., Taku Miyakawa wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/9685/ > ----------------------------------------------------------- > > (Updated March 9, 2013, 8:18 a.m.) > > > Review request for pig. > > > Description > ------- > > This is a review board for https://issues.apache.org/jira/browse/PIG-3215 > > The patch adds LTSVLoader function and its test class. > > > This addresses bug PIG-3215. > https://issues.apache.org/jira/browse/PIG-3215 > > > Diffs > ----- > > > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java > PRE-CREATION > > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestLTSVLoader.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/9685/diff/ > > > Testing > ------- > > ant compile-test > cd contrib/piggybank/java/ > ant -Dtestcase=TestLTSVLoader test > > > Thanks, > > Taku Miyakawa > >
