Re: [appengine-java] Re: mapreduce - passing filters

Nacho Coloma Thu, 18 Nov 2010 04:01:50 -0800

>
> I'm not entirely sure I understand

the scope of the proposed patch. Are you thinking about adding filters
> at the DatastoreRecordReader level? It's not entirely clear to me that
> that provides a benefit over just applying the filter at the start of
> the map() function. Totally willing to believe I'm missing something,
> though.
>


The map() filter runs against your quota. This is OK for once-only tasks
such as schema upgrades, but Mappers can also be used for repetitive tasks
such as mailing, data cleanup, etc. For these cases, being able to work on a
subset of data is important (process only user accounts with mailing
enabled, for example).

The biggest problem to resolve is how to specify the filter clause in
mapreduce.xml. I am considering implementing a GQL parser as simple as
possible, and inject servlet request parameters. Something like:

<property>
<name>mapreduce.mapper.inputformat.datastoreinputformat.query</name>
<value>select * from users where mailing=:value1 and
timestamp<=:value2</value>
</property>

This implies porting the GQL implementation from python to Java, or
implementing an ANTLR-based parser. I feel like I am reinventing the wheel,
so any suggestion to use something that exists (or aim to a simpler design)
is welcome.

On a logistical note, for nontrivial contributions, we require a CLA
> from either you or your employer (depending on who owns the copyright
> for your work) before we can accept significant contributions. The
> relevant forms are at:
> http://code.google.com/legal/individual-cla-v1.0.html
> and http://code.google.com/legal/corporate-cla-v1.0.html. Feel free to
> email me privately if this is an issue.
>

No problem with that.

Regards,

Nacho.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.

Re: [appengine-java] Re: mapreduce - passing filters

Reply via email to