[ 
https://issues.apache.org/jira/browse/ACCUMULO-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated ACCUMULO-2519:
----------------------------------

    Fix Version/s: 1.5.2

> FATE operation failed across upgrade
> ------------------------------------
>
>                 Key: ACCUMULO-2519
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2519
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.5.0, 1.5.1
>            Reporter: Keith Turner
>             Fix For: 1.5.2, 1.6.0
>
>
> While running the new upgrade script I noticed that a FATE operation failed. 
> I think this was caused by the package name changes in 1.6.  However 
> executing FATE ops across an upgrade is probably not safe, its certainly not 
> tested or easy to test.   Discussed this on IRC, should probably refuse to 
> upgrade if FATE stack is not empty. 
> {noformat}
> 2014-03-20 18:20:40,724 [fate.Fate] ERROR: Thread "Repo runner 0" died 
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.accumulo.server.master.tableOps.TraceRepo
> java.lang.RuntimeException: java.lang.RuntimeException: 
> java.lang.ClassNotFoundException: 
> org.apache.accumulo.server.master.tableOps.TraceRepo
>         at org.apache.accumulo.fate.ZooStore.top(ZooStore.java:266)
>         at org.apache.accumulo.fate.AgeOffStore.top(AgeOffStore.java:172)
>         at org.apache.accumulo.fate.Fate$TransactionRunner.run(Fate.java:58)
>         at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:34)
>         at java.lang.Thread.run(Thread.java:701)
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.accumulo.server.master.tableOps.TraceRepo
>         at org.apache.accumulo.fate.ZooStore.deserialize(ZooStore.java:79)
>         at org.apache.accumulo.fate.ZooStore.top(ZooStore.java:262)
>         ... 4 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.accumulo.server.master.tableOps.TraceRepo
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>         at 
> org.apache.accumulo.start.classloader.AccumuloClassLoader$2.loadClass(AccumuloClassLoader.java:278)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:270)
>         at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:624)
>         at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1611)
>         at 
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1516)
>         at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770)
>         at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
>         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
>         at org.apache.accumulo.fate.ZooStore.deserialize(ZooStore.java:77)
>         ... 5 more
> {noformat}
> IRC converstation :
> {noformat}
> <busbey> hurm. so how useful would a test set that injects faults into the 
> !METADATA table be?
> <busbey> or into FATE
> <busbey> for that matter
> <busbey> to make sure that we have sufficient failure handling to avoid 
> catastrophic loss
> <kturner> I think I saw a FATE related bug in the logs also
> <kturner> FATE serializes classes and pushes them on a stack in zookeeper
> <kturner> in 1.6 package names were changed, so things could not deserialize
> <busbey> oh boy
> <busbey> that's not good
> <busbey> so like they were serialized while the cluster was 1.5?
> <busbey> and then post upgrade explosions?
> <elserj> sounds like it
> <busbey> were package names changed 1.4 -> 1.5 related to fate?
> <kturner> yep
> <busbey> because in theory
> <busbey> I could have a 1.4 cluster
> <elserj> almost want to preserve classes which were renamed as deprecated
> <busbey> that I upgrade to 1.5 and then 1.6
> <busbey> and I could, in theory not allow enough time for FATE to clear out 
> in the mean
> <busbey> well, or provide some kind of transition jar
> <busbey> that includes classes to allow for burn off
> <busbey> that you could later remove
> <busbey> this sounds like a blocker
> <busbey> barring some kind of documentation we could do
> <busbey> for safely shutting down a cluster in prep for an upgrade
> <busbey> the monitor doesn't show any indicators for waiting FATE operations, 
> does it?
> <kturner> no
> <kturner> maybe 1.6 could refuse to upgrade if the FATE queue is not empty
> <busbey> filed CCUMULO-2517
> <busbey> well
> <busbey> 1) was this also a problem doing 1.4 -> 1.5?
> <busbey> and we just haven't had anyone hit it yet?
> <elserj> do you have an idea of how many renames this introduces, keith?
> <busbey> 2) that sounds like a good idea
> <busbey> as a first check, then just say "please start up the master under 
> PREV_VERSION" and wait for FATE to clear
> <kturner> we could do the same thing for 1.5
> <busbey> with a ref to upgrade notes that explain how to check if FATE is 
> clear?
> <kturner> yeah
> <busbey> that will require we finish ACCUMULO-2469, I presume?
> <busbey> (that's the ticket for documenting how to access zookeeper)
> <busbey> two additional tickets or one?
> <elserj> there's a class that will print fate ops
> <busbey> 1) upgrade instructions should include how to check if there are 
> fate operations pending
> <busbey> 2) upgrade code should refuse to upgrade if there are fae operations 
> pending
> <busbey> nice! we could use that and leave 2469 for later, then?
> <ctubbsii_bot> https://issues.apache.org/jira/browse/ACCUMULO-2469
> <elserj> ctubbsii_bot you need to trim punctuation
> * murraju ([email protected]) has joined #accumulo
> <busbey> do those two sound like they cover the FATE bug?
> <busbey> I presume we don't know enough yet to make a call on the delete 
> marker thing?
> <busbey> and that any additional guards on the GC should be aiming for 
> post-1.6?
> <kturner> I am creating a ticket, any problem w/ me just plopping this 
> conversation onto the ticket?
> <busbey> sounds good
> <kturner> elserj?
> <elserj> oh, sure
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to