> However, PDFBox 2.0.8-SNAPSHOT has a more 0, 1, 2 and 3s...
>
> The TOP_10_MORE_IN_B column in the contents report shows that there are 15
> more 0's, 15 more 1's 11 more '2's etc.
>
> 0: 15 | 1: 15 | 2: 11 | 20: 5 | 3: 2 | 4: 2
>Yeah but where do they come from? Not from the pure text extraction. In the
>json files, I see that there are
>many "0:", "1:" in the new file. I wonder if this is about acroform fiels? Can
>be seen e.g. near for
>b12c96nfdate36.
Sorry, right, AcroForm. We're now getting some children we weren't before.
2.0.8-SNAPSHOT:
<li altName="date362">@@b12c96nfdate362: </li>
<ol> <li altName="date362">0: </li>
<li altName="date362">1: </li>
<li altName="date362">2: 20 </li>
</ol>
<li altName="date362">b12c96nfdate362: 20</li>
2.0.7:
<li altName="date362">@@b12c96nfdate362: </li>
<li altName="date362">b12c96nfdate362: 20</li>