The PCollection that is being cached is the result of a parallelDo (so it is a DoCollection). The parent InputCollection has the source I defined. Since the first read is triggered by that .cache() call on the DoCollection, the file source given in the InputCollection is never actually used.
Stepping through a debugger shows that even after using pipeline.read(...).cache(), my ReadableSource.read() doesn't get called even though I see my source in outputTargetsToMaterialize.value.source. I only see AvroFileSourceTarget performing the reads so I'm probably missing something (I'm new to crunch). Any pointers would be appreciated. Evan On Thu, 2016-01-14 at 15:29 -0800, Josh Wills wrote: > Hey Evan, > > So I must be missing some context here-- it looks to me like > MRPipeline.getMaterializeSourceTarget first checks to see if you > passed in > an input collection type and then looks at its Source to see if it > implements ReadableSource and uses that if it's available before > defaulting > to the ptype's default file source later on in the file via > createIntermediateOutput. Is the PCollection you're trying to > materialize > not an input collection? e.g., is it an input collection that has had > parallelDo called it on one or more times? > > J > > On Thu, Jan 14, 2016 at 9:08 AM, Evan McClain <[email protected]> > wrote: > > > Hi list, > > > > I am trying to implement a custom ReadableSource by extending > > FileSourceImpl, but it doesn't look like my source is actually > > being > > used since I am reusing AvroTypes (basically just trying to pass > > through some avro getMeta() values). > > > > It looks like cache() -> materialize() -> > > ptype.getDefaultFileSource() > > is being used to perform the actual read (even though > > pipeline.read() > > is being passed my custom Source). > > > > Is there any way to do this without also implementing a custom > > PType? > > > > Thanks > > -- > > Evan McClain > > https://keybase.io/aeroevan > > -- Evan McClain https://keybase.io/aeroevan
signature.asc
Description: This is a digitally signed message part
