[ 
https://issues.apache.org/jira/browse/UIMA-6162?focusedWorklogId=361679&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361679
 ]

ASF GitHub Bot logged work on UIMA-6162:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 20/Dec/19 14:49
            Start Date: 20/Dec/19 14:49
    Worklog Time Spent: 10m 
      Work Description: reckart commented on pull request #16: [UIMA-6162] 
Concurrent binary serialization produces corrupt output
URL: https://github.com/apache/uima-uimaj/pull/16
 
 
   - TUnit test which triggers the concurrent serialization data corruption 
situation
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 361679)
    Remaining Estimate: 0h
            Time Spent: 10m

> Concurrent binary serialization produces corrupt output
> -------------------------------------------------------
>
>                 Key: UIMA-6162
>                 URL: https://issues.apache.org/jira/browse/UIMA-6162
>             Project: UIMA
>          Issue Type: Bug
>          Components: UIMA
>    Affects Versions: 3.1.1SDK
>            Reporter: Richard Eckart de Castilho
>            Priority: Major
>         Attachments: admin.ser
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I suspect there could be an issue in `BinaryCasSerDes`.
> When deserializing the attached file `admin.ser`, I get this stack trace:
> {code:java}
> Caused by: java.lang.ClassCastException: class 
> org.apache.uima.jcas.tcas.Annotation cannot be cast to class 
> org.apache.uima.jcas.cas.Sofa (org.apache.uima.jcas.tcas.Annotation and 
> org.apache.uima.jcas.cas.Sofa are in unnamed module of loader 
> org.apache.catalina.loader.ParallelWebappClassLoader @4593ff34)at 
> org.apache.uima.cas.impl.BinaryCasSerDes.makeSofaFromHeap(BinaryCasSerDes.java:1823)
>  ~[uimaj-core-3.1.1.jar:3.1.1]at 
> org.apache.uima.cas.impl.BinaryCasSerDes.getSofaFromAnnotBase(BinaryCasSerDes.java:1817)
>  ~[uimaj-core-3.1.1.jar:3.1.1]at 
> org.apache.uima.cas.impl.BinaryCasSerDes.createFSsFromHeaps(BinaryCasSerDes.java:1701)
>  ~[uimaj-core-3.1.1.jar:3.1.1]at 
> org.apache.uima.cas.impl.BinaryCasSerDes.reinit(BinaryCasSerDes.java:259) 
> ~[uimaj-core-3.1.1.jar:3.1.1]at 
> org.apache.uima.cas.impl.BinaryCasSerDes.reinit(BinaryCasSerDes.java:328) 
> ~[uimaj-core-3.1.1.jar:3.1.1]at 
> org.apache.uima.cas.impl.Serialization.deserializeCASComplete(Serialization.java:129)
>  ~[uimaj-core-3.1.1.jar:3.1.1]{code}
>  The code used to read the file before deserializing is as follows:
> {code:java}
>     public static void readSerializedCas(CAS aCas, File aFile)
>         throws IOException
>     {
>         try (ObjectInputStream is = new ObjectInputStream(new 
> FileInputStream(aFile))) {
>             CASCompleteSerializer serializer = (CASCompleteSerializer) 
> is.readObject();
>             deserializeCASComplete(serializer, (CASImpl) aCas);
>         }
>         catch (ClassNotFoundException e) {
>             throw new IOException(e);
>         }
>     }
> {code}
> I set a breakpoint to BinaryCasSerDes:1608 which is a for loop iterating over 
> the heap. Apparently, the first feature structure that is encountered is an 
> annotation type which is NOT the SOFA. Then in line 1700, the deserializer 
> tries to resolve the SOFA for this annotation but fails because it has not 
> yet been deserialized. Eventually makeSofaFromHeap is called and checks if a 
> SOFA needs to be created. It tries to look up the SOFAs ID (1) from 
> csds.addr2fs.get(sofaAddr) (BinaryCasSerDes:1821) and generates a new SOFA. 
> However, when the SECOND annotation is read and csds.addr2fs.get(sofaAddr) 
> (BinaryCasSerDes:1821) is called again and tries to resolve the SOFA from 
> addr 1, it gets the previously deserialized annotation instead of the SOFA 
> annotation that had been created.
> The SOFA that has been implicitly created is added to the csds.addr2fs map at 
> key 1... however, later in BinaryCasSerDes:1723, the key 1 is overwritten by 
> the deserialized annotation:
> {code}
>         if (!isSofa) { // if it was a sofa, other code added or pended it
>           csds.addFS(fs, heapIndex); // this overrides to SOFA that was 
> created at key 1 because heapIndex is also 1
>         }
> {code}
> The heap looks something like this:
> {code}
> [0, 187, 1, 33, 46, 199, 200, 201, 44, 202, 187, 1, 33, 46, 203, 204, 205, 
> 45, 206, 187, 1, 33, 46, 207, 208, 209, 46, 210, 187, 1, 33, 46, 211, 212, 
> 213, 47, 214, 187, 1, 33, 46, 215, 216, 217, 48, 1, 187, 1,...
> {code}
> I guess that 187 is the type code of the first annotation and we can see it 
> repeats a couple of times. The 1 seems to be the SOFA ID - the first feature 
> of the feature structures. However, instead of 1 referring to the address of 
> the SOFA, it points at the first annotation which is NOT a SOFA.
> Bug in the serialization code assuming that the SOFA is always in the first 
> position?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to