[
https://issues.apache.org/jira/browse/IMPALA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215275#comment-17215275
]
ASF subversion and git services commented on IMPALA-10243:
----------------------------------------------------------
Commit 6542b6070d0dd4237e65943d09503282c24ccd84 in impala's branch
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6542b60 ]
IMPALA-10243: ConcurrentModificationException during parallel INSERTs
Impala might throw a ConcurrentModificationException during a high
load of INSERTs to the same table. The exception happens during thrift
serialization of TUpdateCatalogResponse which have a reference to the
metastore table. The serialization happens without a lock, so another
thread might modify the metastore table object in the meantime.
This can potentially happen in CatalogOpExecutor.updateCatalog() which
updates the catalog version and unsets table column statistics.
For some reason I only saw this error with local catalog.
The problem is that in Table.toThrift() we set a reference to the
metastore table object instead of deep copying it. So my fix is to deep
copy the metastore table, this prevents concurrent modifications.
Testing
* added stress test 'test_insert_stress.py'
Change-Id: Ie656925d764d5eb26c318703ca425529ecf7a3a3
Reviewed-on: http://gerrit.cloudera.org:8080/16602
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> ConcurrentModificationException during parallel INSERTs
> -------------------------------------------------------
>
> Key: IMPALA-10243
> URL: https://issues.apache.org/jira/browse/IMPALA-10243
> Project: IMPALA
> Issue Type: Bug
> Reporter: Zoltán Borók-Nagy
> Assignee: Zoltán Borók-Nagy
> Priority: Major
>
> Impala might throw a ConcurrentModificationException during a high load of
> INSERTs to the same table.
> The exception happens during thrift serialization of TUpdateCatalogResponse
> which have a reference to the metastore table. The serialization happens
> without a lock, so another thread might modify the metastore table object in
> the meantime. This can potentially happen in
> CatalogOpExecutor.updateCatalog() which updates the catalog version and
> unsets table column statistics. A high load of INSERT statements increase the
> probability of the concurrent modification.
> I think the problem is that in Table.toThrift() we set a reference to the
> metastore table object instead of deep copying it:
> [https://github.com/apache/impala/blob/481ea4ab0d476a4aa491f99c2a4e376faddc0b03/fe/src/main/java/org/apache/impala/catalog/Table.java#L505]
> The stack trace looks like the following:
> [1] java.util.HashMap$HashIterator.nextNode (HashMap.java:1,445)
> [2] java.util.HashMap$EntryIterator.next (HashMap.java:1,479)
> [3] java.util.HashMap$EntryIterator.next (HashMap.java:1,477)
> [4] org.apache.hadoop.hive.metastore.api.Table$TableStandardScheme.write
> (Table.java:2,641)
> [5] org.apache.hadoop.hive.metastore.api.Table$TableStandardScheme.write
> (Table.java:2,324)
> [6] org.apache.hadoop.hive.metastore.api.Table.write (Table.java:2,082)
> [7] org.apache.impala.thrift.TTable$TTableStandardScheme.write
> (TTable.java:1,829)
> [8] org.apache.impala.thrift.TTable$TTableStandardScheme.write
> (TTable.java:1,569)
> [9] org.apache.impala.thrift.TTable.write (TTable.java:1,357)
> [10]
> org.apache.impala.thrift.TCatalogObject$TCatalogObjectStandardScheme.write
> (TCatalogObject.java:1,433)
> [11]
> org.apache.impala.thrift.TCatalogObject$TCatalogObjectStandardScheme.write
> (TCatalogObject.java:1,272)
> [12] org.apache.impala.thrift.TCatalogObject.write
> (TCatalogObject.java:1,086)
> [13]
> org.apache.impala.thrift.TCatalogUpdateResult$TCatalogUpdateResultStandardScheme.write
> (TCatalogUpdateResult.java:908)
> [14]
> org.apache.impala.thrift.TCatalogUpdateResult$TCatalogUpdateResultStandardScheme.write
> (TCatalogUpdateResult.java:780)
> [15] org.apache.impala.thrift.TCatalogUpdateResult.write
> (TCatalogUpdateResult.java:682)
> [16]
> org.apache.impala.thrift.TUpdateCatalogResponse$TUpdateCatalogResponseStandardScheme.write
> (TUpdateCatalogResponse.java:363)
> [17]
> org.apache.impala.thrift.TUpdateCatalogResponse$TUpdateCatalogResponseStandardScheme.write
> (TUpdateCatalogResponse.java:325)
> [18] org.apache.impala.thrift.TUpdateCatalogResponse.write
> (TUpdateCatalogResponse.java:273)
> [19] org.apache.thrift.TSerializer.serialize (TSerializer.java:79)
> [20] org.apache.impala.service.JniCatalog.updateCatalog (JniCatalog.java:314)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]