[ 
https://issues.apache.org/jira/browse/JENA-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500711#comment-17500711
 ] 

Lorenz Bühmann commented on JENA-2225:
--------------------------------------

Not sure if opening a new issue would be better, but I guess we're not done 
here. We didn't recognize this because apparently I didn't know TDB2 assumes 
stats file in TDB2_LOCATION/DataXXX:

Now that the stats are being loaded, the change to long values leads to 
additional parse errors during the reordering setup/application because there 
are still integer values assumed:

{{
java.lang.NumberFormatException: For input string: "16666525095"
        at 
java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.base/java.lang.Integer.parseInt(Integer.java:652)
        at java.base/java.lang.Integer.parseInt(Integer.java:770)
        at org.apache.jena.sparql.sse.Item.asInteger(Item.java:275)
        at 
org.apache.jena.sparql.engine.optimizer.StatsMatcher.init(StatsMatcher.java:123)
        at 
org.apache.jena.sparql.engine.optimizer.StatsMatcher.<init>(StatsMatcher.java:97)
        at 
org.apache.jena.sparql.engine.optimizer.reorder.ReorderLib.weighted(ReorderLib.java:84)
        at 
org.apache.jena.tdb2.store.TDB2StorageBuilder.chooseReorderTransformation(TDB2StorageBuilder.java:352)
        at 
org.apache.jena.tdb2.store.TDB2StorageBuilder.build(TDB2StorageBuilder.java:112)
        at 
org.apache.jena.tdb2.sys.StoreConnection.make(StoreConnection.java:91)
        at 
org.apache.jena.tdb2.sys.StoreConnection.connectCreate(StoreConnection.java:59)
        at 
org.apache.jena.tdb2.sys.DatabaseOps.createSwitchable(DatabaseOps.java:100)
        at org.apache.jena.tdb2.sys.DatabaseOps.create(DatabaseOps.java:81)
        at 
org.apache.jena.tdb2.sys.DatabaseConnection.build(DatabaseConnection.java:101)
        at 
org.apache.jena.tdb2.sys.DatabaseConnection.lambda$make$0(DatabaseConnection.java:72)
        at 
java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
        at 
org.apache.jena.tdb2.sys.DatabaseConnection.make(DatabaseConnection.java:72)
        at 
org.apache.jena.tdb2.sys.DatabaseConnection.connectCreate(DatabaseConnection.java:61)
        at 
org.apache.jena.tdb2.sys.DatabaseConnection.connectCreate(DatabaseConnection.java:52)
        at 
org.apache.jena.tdb2.DatabaseMgr.DB_ConnectCreate(DatabaseMgr.java:41)
        at 
org.apache.jena.tdb2.DatabaseMgr.connectDatasetGraph(DatabaseMgr.java:46)
        at org.apache.jena.tdb2.TDB2Factory.connectDataset(TDB2Factory.java:40)
        at tdb2.cmdline.ModTDBDataset.createDataset(ModTDBDataset.java:105)
        at arq.cmdline.ModDataset.getDataset(ModDataset.java:35)
        at arq.query.getDataset(query.java:179)
        at arq.query.queryExec(query.java:226)
        at arq.query.exec(query.java:157)
        at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:87)
        at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:56)
        at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:43)
        at tdb2.tdbquery.main(tdbquery.java:30)
}}

> TDB/TDB2 dataset size stat serialized incorrectly for large datasets
> --------------------------------------------------------------------
>
>                 Key: JENA-2225
>                 URL: https://issues.apache.org/jira/browse/JENA-2225
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: TDB, TDB2
>    Affects Versions: Jena 4.3.1
>            Reporter: Lorenz Bühmann
>            Assignee: Andy Seaborne
>            Priority: Minor
>             Fix For: Jena 4.4.0
>
>
> When computing the TDB/TDB2 stats via CLI the size will be serialized 
> incorrectly for large datasets.
> For example for latest Wikidata Truthy we get
> {noformat}
> (count -1983667112)){noformat}
> This happens because for both the corresponding `Stats.java` class does 
> enforce an Integer type Node though the value is a long type:
> {code:java}
> if ( count >= 0 )
>     addPair(meta.getList(), StatsMatcher.COUNT, 
> NodeFactoryExtra.intToNode((int)count)) ; {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to