Jira Created: https://issues.apache.org/jira/browse/PHOENIX-2367 I will submit a patch for review soon.
On Tue, Nov 3, 2015 at 10:52 AM, Jan Fernando <jferna...@salesforce.com> wrote: > +1 on making this change. Can you file a JIRA for it? > > On Mon, Nov 2, 2015 at 4:31 PM, Siddhi Mehta <sm26...@gmail.com> wrote: > > > Hey All, > > > > I wanted to add a notion of skipping invalid rows for PhoenixHbaseStorage > > similar to how the CSVBulkLoad tool has an option of ignoring the bad > > rows.I did some work on the apache pig code that allows Storers to have a > > notion of Customizable/Configurable Errors PIG-4704 > > <https://issues.apache.org/jira/browse/PIG-4704>. > > > > I wanted to plug this behavior for PhoenixHbaseStorage and propose > certain > > changes for the same. > > > > *Current Behavior/Problem:* > > > > PhoenixRecordWriter makes use of executeBatch() to process rows once > batch > > size is reached. If there are any client side validation/syntactical > errors > > like data not fitting the column size, executeBatch() throws an exception > > and there is no-way to retrieve the valid rows from the batch and retry > > them. We discard the whole batch or fail the job without errorhandling. > > > > With auto commit set to false execute() also servers the purpose of not > > making any rpc calls but does a bunch of validation client side and adds > > it to the client cache of mutation. > > > > On conn.commit() we make a rpc call. > > > > *Proposed Change* > > > > To be able to use Configurable ErrorHandling and ignore only the failed > > records instead of discarding the whole batch I want to propose changing > > the behavior in PhoenixRecordWriter from execute to executeBatch() or > > having a configuration to toggle between the 2 behaviors > > Thoughts? > > >