Re: [appengine-java] Re: mapreduce - passing filters

Nacho Coloma Wed, 24 Nov 2010 10:46:56 -0800

>
> One other thought: instead of adding a GQL interpreter, you might just


add a hook for loading a class provided by the user. That class would
> implement a Filter interface with a method that takes a Configuration
> and returns a Query object so in your example, mailing and timestamp
> would get passed in as Configuration parameters and a query object
> corresponding to the GQL statement you put would be built by a Filter
> class provided by the user. It would act kind of like a templating
> language for building queries. Make sense/sound like a good idea?


Actually, Filter would be simpler to implement and GQL can be added later as
a concrete Filter implementation if someone is still missing it (I doubt
it). It also solves the problem of specifying the type of arguments.

BTW, arguments should be passed in as request parameters, not configuration
attributes (like "timestamp greater than" or "process all comments by user
X" for example). This means that Filter may need encapsulated access to some
methods of  AppEngineJobContext.request.

It seems that it can be implemented in a couple of hours. I will still wait
for 1.4.0, though.

On Nov 18, 7:01 am, Nacho Coloma <[email protected]> wrote:
> > > I'm not entirely sure I understand
> >
> > the scope of the proposed patch. Are you thinking about adding filters
> >
> > > at the DatastoreRecordReader level? It's not entirely clear to me that
> > > that provides a benefit over just applying the filter at the start of
> > > the map() function. Totally willing to believe I'm missing something,
> > > though.
> >
> > The map() filter runs against your quota. This is OK for once-only tasks
> > such as schema upgrades, but Mappers can also be used for repetitive
> tasks
> > such as mailing, data cleanup, etc. For these cases, being able to work
> on a
> > subset of data is important (process only user accounts with mailing
> > enabled, for example).
> >
> > The biggest problem to resolve is how to specify the filter clause in
> > mapreduce.xml. I am considering implementing a GQL parser as simple as
> > possible, and inject servlet request parameters. Something like:
> >
> > <property>
> > <name>mapreduce.mapper.inputformat.datastoreinputformat.query</name>
> > <value>select * from users where mailing=:value1 and
> > timestamp<=:value2</value>
> > </property>
> >
> > This implies porting the GQL implementation from python to Java, or
> > implementing an ANTLR-based parser. I feel like I am reinventing the
> wheel,
> > so any suggestion to use something that exists (or aim to a simpler
> design)
> > is welcome.
> >
> > On a logistical note, for nontrivial contributions, we require a CLA
> >
> > > from either you or your employer (depending on who owns the copyright
> > > for your work) before we can accept significant contributions. The
> > > relevant forms are at:
> > >http://code.google.com/legal/individual-cla-v1.0.html
> > > andhttp://code.google.com/legal/corporate-cla-v1.0.html. Feel free to
> > > email me privately if this is an issue.
> >
> > No problem with that.
> >
> > Regards,
> >
> > Nacho.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine for Java" group.
> To post to this group, send email to
> [email protected].
> To unsubscribe from this group, send email to
> [email protected]<google-appengine-java%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine-java?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.

Re: [appengine-java] Re: mapreduce - passing filters

Reply via email to