> > I'm not entirely sure I understand the scope of the proposed patch. Are you thinking about adding filters > at the DatastoreRecordReader level? It's not entirely clear to me that > that provides a benefit over just applying the filter at the start of > the map() function. Totally willing to believe I'm missing something, > though. >
The map() filter runs against your quota. This is OK for once-only tasks such as schema upgrades, but Mappers can also be used for repetitive tasks such as mailing, data cleanup, etc. For these cases, being able to work on a subset of data is important (process only user accounts with mailing enabled, for example). The biggest problem to resolve is how to specify the filter clause in mapreduce.xml. I am considering implementing a GQL parser as simple as possible, and inject servlet request parameters. Something like: <property> <name>mapreduce.mapper.inputformat.datastoreinputformat.query</name> <value>select * from users where mailing=:value1 and timestamp<=:value2</value> </property> This implies porting the GQL implementation from python to Java, or implementing an ANTLR-based parser. I feel like I am reinventing the wheel, so any suggestion to use something that exists (or aim to a simpler design) is welcome. On a logistical note, for nontrivial contributions, we require a CLA > from either you or your employer (depending on who owns the copyright > for your work) before we can accept significant contributions. The > relevant forms are at: > http://code.google.com/legal/individual-cla-v1.0.html > and http://code.google.com/legal/corporate-cla-v1.0.html. Feel free to > email me privately if this is an issue. > No problem with that. Regards, Nacho. -- You received this message because you are subscribed to the Google Groups "Google App Engine for Java" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
