+1 on making this change. Can you file a JIRA for it? On Mon, Nov 2, 2015 at 4:31 PM, Siddhi Mehta <sm26...@gmail.com> wrote:
> Hey All, > > I wanted to add a notion of skipping invalid rows for PhoenixHbaseStorage > similar to how the CSVBulkLoad tool has an option of ignoring the bad > rows.I did some work on the apache pig code that allows Storers to have a > notion of Customizable/Configurable Errors PIG-4704 > <https://issues.apache.org/jira/browse/PIG-4704>. > > I wanted to plug this behavior for PhoenixHbaseStorage and propose certain > changes for the same. > > *Current Behavior/Problem:* > > PhoenixRecordWriter makes use of executeBatch() to process rows once batch > size is reached. If there are any client side validation/syntactical errors > like data not fitting the column size, executeBatch() throws an exception > and there is no-way to retrieve the valid rows from the batch and retry > them. We discard the whole batch or fail the job without errorhandling. > > With auto commit set to false execute() also servers the purpose of not > making any rpc calls but does a bunch of validation client side and adds > it to the client cache of mutation. > > On conn.commit() we make a rpc call. > > *Proposed Change* > > To be able to use Configurable ErrorHandling and ignore only the failed > records instead of discarding the whole batch I want to propose changing > the behavior in PhoenixRecordWriter from execute to executeBatch() or > having a configuration to toggle between the 2 behaviors > Thoughts? >