[ 
https://issues.apache.org/jira/browse/HIVE-6918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Camille Marvin updated HIVE-6918:
---------------------------------

    Description: 
An HDINSIGHT customer is doing some heavy metadata operations using an embedded 
metastore. They get an error with a duplicate key in a metastore table 
'dbo.SERDES'. They have multiple concurrent jobs doing ALTER TABLE concurrently 
(on different tables) using the same metastore database, but with each job 
having an embedded metastore because they set hive.metastore.uris to the empty 
string.

The script looks like:

set hive.metastore.uris=;
...
CREATE EXTERNAL TABLE IF NOT EXISTS 
InputData_828c53de_ad24_928e_3db3_948cf821a3e0 (
...
)
PARTITIONED BY (tenant string, d string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
;
ALTER TABLE InputData_828c53de_ad24_928e_3db3_948cf821a3e0 ...;
... (several more like this);
ALTER TABLE InputData_828c53de_ad24_928e_3db3_948cf821a3e0 ADD IF NOT EXISTS 
PARTITION (tenant='8dddaf7c-2354-47ae-87a7-b781f14f8c11', d='20140414') 
LOCATION 
'wasb://inputb...@storage27415020383770839.blob.core.windows.net/v0/tenant=8dddaf7c-2354-47ae-87a7-b781f14f8c11/d=20140414/';
... several more like the above (14 ALTER TABLE statements in a row)
...

Then they get this error:

...
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
NestedThrowablesStackTrace:
java.sql.BatchUpdateException: Violation of PRIMARY KEY constraint 
'PK_serdes_SERDE_ID'. Cannot insert duplicate key in object 'dbo.SERDES'. The 
duplicate key value is (209703).
        at 
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:1160)
        at 
com.jolbox.bonecp.StatementHandle.executeBatch(StatementHandle.java:469)
        at 
org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:583)
        at 
org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:291)
        at 
org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:267)
        at 
org.datanucleus.store.rdbms.scostore.RDBMSJoinMapStore.getValue(RDBMSJoinMapStore.java:656)
        at 
org.datanucleus.store.rdbms.scostore.RDBMSJoinMapStore.putAll(RDBMSJoinMapStore.java:195)
        at 
org.datanucleus.store.mapped.mapping.MapMapping.postInsert(MapMapping.java:135)
        at 
org.datanucleus.store.rdbms.request.InsertRequest.execute(InsertRequest.java:517)
...

  was:
An HDINSIGHT customer is doing some heavy metadata operations using an embedded 
metastore. They get an error with a duplicate key in a metastore table 
'dbo.SERDES'. They have multiple concurrent jobs doing ALTER TABLE concurrently 
(on different tables, I believe) using the same metastore database, but with 
each job having an embedded metastore because they set hive.metastore.uris to 
the empty string.

The script looks like:

set hive.metastore.uris=;
...
CREATE EXTERNAL TABLE IF NOT EXISTS 
InstanceSpaceData_828c53de_ad24_928e_3db3_948cf821a3e0 (
...
)
PARTITIONED BY (tenant string, d string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
;
ALTER TABLE InstanceSpaceData_828c53de_ad24_928e_3db3_948cf821a3e0 ...;
... (several more like this);
ALTER TABLE InstanceSpaceData_828c53de_ad24_928e_3db3_948cf821a3e0 ADD IF NOT 
EXISTS PARTITION (tenant='8dddaf7c-2354-47ae-87a7-b781f14f8c11', d='20140414') 
LOCATION 
'wasb://instancespaceb...@advisor27415020383770839.blob.core.windows.net/v0/tenant=8dddaf7c-2354-47ae-87a7-b781f14f8c11/d=20140414/';
... several more like the above (14 ALTER TABLE statements in a row)
...

Then they get this error:

...
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
NestedThrowablesStackTrace:
java.sql.BatchUpdateException: Violation of PRIMARY KEY constraint 
'PK_serdes_SERDE_ID'. Cannot insert duplicate key in object 'dbo.SERDES'. The 
duplicate key value is (209703).
        at 
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:1160)
        at 
com.jolbox.bonecp.StatementHandle.executeBatch(StatementHandle.java:469)
        at 
org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:583)
        at 
org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:291)
        at 
org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:267)
        at 
org.datanucleus.store.rdbms.scostore.RDBMSJoinMapStore.getValue(RDBMSJoinMapStore.java:656)
        at 
org.datanucleus.store.rdbms.scostore.RDBMSJoinMapStore.putAll(RDBMSJoinMapStore.java:195)
        at 
org.datanucleus.store.mapped.mapping.MapMapping.postInsert(MapMapping.java:135)
        at 
org.datanucleus.store.rdbms.request.InsertRequest.execute(InsertRequest.java:517)
...



removing product-specific attributes from script, clarifying uncertainty (yes, 
it is separate tables)

> ALTER TABLE using embedded metastore fails with duplicate key violation in 
> 'dbo.SERDES'
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-6918
>                 URL: https://issues.apache.org/jira/browse/HIVE-6918
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 0.11.0
>         Environment: hive-0.11.0.1.3.7.0-01272
> HDInsight version: 2.1.4.0.661685
>            Reporter: Eric Hanson
>
> An HDINSIGHT customer is doing some heavy metadata operations using an 
> embedded metastore. They get an error with a duplicate key in a metastore 
> table 'dbo.SERDES'. They have multiple concurrent jobs doing ALTER TABLE 
> concurrently (on different tables) using the same metastore database, but 
> with each job having an embedded metastore because they set 
> hive.metastore.uris to the empty string.
> The script looks like:
> set hive.metastore.uris=;
> ...
> CREATE EXTERNAL TABLE IF NOT EXISTS 
> InputData_828c53de_ad24_928e_3db3_948cf821a3e0 (
> ...
> )
> PARTITIONED BY (tenant string, d string)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> ;
> ALTER TABLE InputData_828c53de_ad24_928e_3db3_948cf821a3e0 ...;
> ... (several more like this);
> ALTER TABLE InputData_828c53de_ad24_928e_3db3_948cf821a3e0 ADD IF NOT EXISTS 
> PARTITION (tenant='8dddaf7c-2354-47ae-87a7-b781f14f8c11', d='20140414') 
> LOCATION 
> 'wasb://inputb...@storage27415020383770839.blob.core.windows.net/v0/tenant=8dddaf7c-2354-47ae-87a7-b781f14f8c11/d=20140414/';
> ... several more like the above (14 ALTER TABLE statements in a row)
> ...
> Then they get this error:
> ...
>       at java.lang.reflect.Method.invoke(Method.java:601)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> NestedThrowablesStackTrace:
> java.sql.BatchUpdateException: Violation of PRIMARY KEY constraint 
> 'PK_serdes_SERDE_ID'. Cannot insert duplicate key in object 'dbo.SERDES'. The 
> duplicate key value is (209703).
>       at 
> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:1160)
>       at 
> com.jolbox.bonecp.StatementHandle.executeBatch(StatementHandle.java:469)
>       at 
> org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:583)
>       at 
> org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:291)
>       at 
> org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:267)
>       at 
> org.datanucleus.store.rdbms.scostore.RDBMSJoinMapStore.getValue(RDBMSJoinMapStore.java:656)
>       at 
> org.datanucleus.store.rdbms.scostore.RDBMSJoinMapStore.putAll(RDBMSJoinMapStore.java:195)
>       at 
> org.datanucleus.store.mapped.mapping.MapMapping.postInsert(MapMapping.java:135)
>       at 
> org.datanucleus.store.rdbms.request.InsertRequest.execute(InsertRequest.java:517)
> ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to