[ 
https://issues.apache.org/jira/browse/HIVE-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13547920#comment-13547920
 ] 

Hudson commented on HIVE-3826:
------------------------------

Integrated in Hive-trunk-hadoop2 #54 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/54/])
    HIVE-3826 Rollbacks and retries of drops cause 
org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database row)
(Kevin Wilfong via namit) (Revision 1425247)

     Result = ABORTED
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1425247
Files : 
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java

                
> Rollbacks and retries of drops cause 
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row)
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3826
>                 URL: https://issues.apache.org/jira/browse/HIVE-3826
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 0.11.0
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>             Fix For: 0.11.0
>
>         Attachments: HIVE-3826.1.patch.txt
>
>
> I'm not sure if this is the only cause of the exception 
> "org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row)" from the metastore, but one cause seems to be related to a drop command 
> failing, and being retried by the client.
> Based on focusing on a single thread in the metastore with DEBUG level 
> logging, I was seeing the objects that were intended to be dropped remaining 
> in the PersistenceManager cache even after a rollback.  The steps seemed to 
> be as follows:
> 1) First attempt to drop the table, the table is pulled into the 
> PersistenceManager cache for the purposes of dropping
> 2) The drop fails, e.g. due to a lock wait timeout on the SQL backend, this 
> causes a rollback of the transaction
> 3) The drop is retried using a different thread on the metastore Thrift 
> server or a different server and succeeds
> 4) Back on the original thread of the original Thrift server someone tries to 
> perform some write operation which produces a commit.  This causes those 
> detached objects related to the dropped table to attempt to reattach, causing 
> JDO to query the SQL backend for those objects which it can't find.  This 
> causes the exception.
> I was able to reproduce this regularly using the following sequence of 
> commands:
> Hive client 1 (Hive1): connected to a metastore Thrift server running a 
> single thread, I hard coded a RuntimeException into the code to drop a table 
> in the ObjectStore, specifically right before the commit in 
> preDropStorageDescriptor, to induce a rollback.  I also turned off all 
> retries at all layers of the metastore.
> Hive client 2 (Hive2): connected to a separate metastore Thrift server 
> running with standard configs and code
> 1: On Hive1, CREATE TABLE t1 (c STRING);
> 2: On Hive1, DROP TABLE t1; // This failed due to the hard coded exception
> 3: On Hive2, DROP TABLE t1; // Succeeds
> 4: On Hive1, CREATE DATABASE d1; // This database already existed, I'm not 
> sure why this was necessary, but it didn't work without it, it seemed to have 
> an affect on the order objects were committed in the next step
> 5: On Hive1, CREATE DATABASE d2; // This database didn't exist, it would fail 
> with the NucleusObjectNotFoundException
> The object that would cause the exception varied, I saw the MTable, the 
> MSerDeInfo, and MTablePrivilege from the table that attempted to be dropped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to