[ https://issues.apache.org/jira/browse/PHOENIX-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326564#comment-14326564 ]
ASF GitHub Bot commented on PHOENIX-900: ---------------------------------------- Github user elilevine commented on a diff in the pull request: https://github.com/apache/phoenix/pull/37#discussion_r24941161 --- Diff: phoenix-core/src/main/java/org/apache/phoenix/execute/MutationState.java --- @@ -78,7 +87,11 @@ private PhoenixConnection connection; private final long maxSize; private final ImmutableBytesPtr tempPtr = new ImmutableBytesPtr(); - private final Map<TableRef, Map<ImmutableBytesPtr,Map<PColumn,byte[]>>> mutations = Maps.newHashMapWithExpectedSize(3); // TODO: Sizing? + /** + * Use {@link TreeMap} for {@link #mutations} below to enforce a deterministic ordering for easier testing of partial failures. + * @see PartialCommitIT + */ + private final Map<TableRef, Map<ImmutableBytesPtr,RowMutationState>> mutations = Maps.newTreeMap(new TableRefComparator()); --- End diff -- Not sure I understand how the collection can be separated from MutationState in an integration test, since this collection is internal and determines the order of mutations before a CommitException is thrown. A core scenario tested in PartialCommitIT is one where there is a mix of mutations that succeed, mutations that fail, and mutations that are unprocessed (but would have succeed if executed in isolation) done on the same Connection.commit(). This scenario is important because of the way MutationState works: it separates mutations by table and sends each table's mutations to HBase iteratively, aborting with CommitException upon the first exception caught from HTableInteface.batch(). In order to write such a test we need to know upfront the order in which tables will be processed by MutationState, so sorting in the test is not enough IMHO. As you rightly point out in your previous comments, the number of of statements (and thus tables) will be 1 for most common use cases. Sure, in theory TreeSets are slower than HashSets, but seems unlikely to be an issue in practice. One more option that might be good would be to revert to using a HashMap and call Collections.sort() once during commit(). This avoid keeping a TreeMap sorted on every MutationState.join() in cases where the ratio of join() to commit() calls might be high. > Partial results for mutations > ----------------------------- > > Key: PHOENIX-900 > URL: https://issues.apache.org/jira/browse/PHOENIX-900 > Project: Phoenix > Issue Type: Bug > Affects Versions: 3.0.0, 4.0.0 > Reporter: Eli Levine > Assignee: Eli Levine > Attachments: PHOENIX-900.patch > > > HBase provides a way to retrieve partial results of a batch operation: > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#batch%28java.util.List,%20java.lang.Object[]%29 > Chatted with James about this offline: > Yes, this could be included in the CommitException we throw > (MutationState:412). We already include the batches that have been > successfully committed to the HBase server in this exception. Would you be up > for adding this additional information? You'd want to surface this in a > Phoenix-y way in a method on CommitException, something like this: ResultSet > getPartialCommits(). You can easily create an in memory ResultSet using > MaterializedResultIterator plus the PhoenixResultSet constructor that accepts > this (just create a new empty PhoenixStatement with the PhoenixConnection for > the other arg). -- This message was sent by Atlassian JIRA (v6.3.4#6332)