[ 
https://issues.apache.org/jira/browse/PHOENIX-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326564#comment-14326564
 ] 

ASF GitHub Bot commented on PHOENIX-900:
----------------------------------------

Github user elilevine commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/37#discussion_r24941161
  
    --- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/execute/MutationState.java ---
    @@ -78,7 +87,11 @@
         private PhoenixConnection connection;
         private final long maxSize;
         private final ImmutableBytesPtr tempPtr = new ImmutableBytesPtr();
    -    private final Map<TableRef, 
Map<ImmutableBytesPtr,Map<PColumn,byte[]>>> mutations = 
Maps.newHashMapWithExpectedSize(3); // TODO: Sizing?
    +    /**
    +     * Use {@link TreeMap} for {@link #mutations} below to enforce a 
deterministic ordering for easier testing of partial failures.
    +     * @see PartialCommitIT
    +     */
    +    private final Map<TableRef, Map<ImmutableBytesPtr,RowMutationState>> 
mutations = Maps.newTreeMap(new TableRefComparator());
    --- End diff --
    
    Not sure I understand how the collection can be separated from 
MutationState in an integration test, since this collection is internal and 
determines the order of mutations before a CommitException is thrown.
    
    A core scenario tested in PartialCommitIT is one where there is a mix of 
mutations that succeed, mutations that fail, and mutations that are unprocessed 
(but would have succeed if executed in isolation) done on the same 
Connection.commit(). This scenario is important because of the way 
MutationState works: it separates mutations by table and sends each table's 
mutations to HBase iteratively, aborting with CommitException upon the first 
exception caught from HTableInteface.batch(). In order to write such a test we 
need to know upfront the order in which tables will be processed by 
MutationState, so sorting in the test is not enough IMHO.
    
    As you rightly point out in your previous comments, the number of of 
statements (and thus tables) will be 1 for most common use cases. Sure, in 
theory TreeSets are slower than HashSets, but seems unlikely to be an issue in 
practice.
    
    One more option that might be good would be to revert to using a HashMap 
and call Collections.sort() once during commit(). This avoid keeping a TreeMap 
sorted on every MutationState.join() in cases where the ratio of join() to 
commit() calls might be high.


> Partial results for mutations
> -----------------------------
>
>                 Key: PHOENIX-900
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-900
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 4.0.0
>            Reporter: Eli Levine
>            Assignee: Eli Levine
>         Attachments: PHOENIX-900.patch
>
>
> HBase provides a way to retrieve partial results of a batch operation: 
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#batch%28java.util.List,%20java.lang.Object[]%29
> Chatted with James about this offline:
> Yes, this could be included in the CommitException we throw 
> (MutationState:412). We already include the batches that have been 
> successfully committed to the HBase server in this exception. Would you be up 
> for adding this additional information? You'd want to surface this in a 
> Phoenix-y way in a method on CommitException, something like this: ResultSet 
> getPartialCommits(). You can easily create an in memory ResultSet using 
> MaterializedResultIterator plus the PhoenixResultSet constructor that accepts 
> this (just create a new empty PhoenixStatement with the PhoenixConnection for 
> the other arg).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to