[ https://issues.apache.org/jira/browse/JENA-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500711#comment-17500711 ]
Andy Seaborne edited comment on JENA-2225 at 3/3/22, 1:05 PM: -------------------------------------------------------------- Not sure if opening a new issue would be better, but I guess we're not done here. We didn't recognize this because apparently I didn't know TDB2 assumes stats file in TDB2_LOCATION/DataXXX: Now that the stats are being loaded, the change to long values leads to additional parse errors during the reordering setup/application because there are still integer values assumed: {noformat} java.lang.NumberFormatException: For input string: "16666525095" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.base/java.lang.Integer.parseInt(Integer.java:652) at java.base/java.lang.Integer.parseInt(Integer.java:770) at org.apache.jena.sparql.sse.Item.asInteger(Item.java:275) at org.apache.jena.sparql.engine.optimizer.StatsMatcher.init(StatsMatcher.java:123) at org.apache.jena.sparql.engine.optimizer.StatsMatcher.<init>(StatsMatcher.java:97) at org.apache.jena.sparql.engine.optimizer.reorder.ReorderLib.weighted(ReorderLib.java:84) at org.apache.jena.tdb2.store.TDB2StorageBuilder.chooseReorderTransformation(TDB2StorageBuilder.java:352) at org.apache.jena.tdb2.store.TDB2StorageBuilder.build(TDB2StorageBuilder.java:112) at org.apache.jena.tdb2.sys.StoreConnection.make(StoreConnection.java:91) at org.apache.jena.tdb2.sys.StoreConnection.connectCreate(StoreConnection.java:59) at org.apache.jena.tdb2.sys.DatabaseOps.createSwitchable(DatabaseOps.java:100) at org.apache.jena.tdb2.sys.DatabaseOps.create(DatabaseOps.java:81) at org.apache.jena.tdb2.sys.DatabaseConnection.build(DatabaseConnection.java:101) at org.apache.jena.tdb2.sys.DatabaseConnection.lambda$make$0(DatabaseConnection.java:72) at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705) at org.apache.jena.tdb2.sys.DatabaseConnection.make(DatabaseConnection.java:72) at org.apache.jena.tdb2.sys.DatabaseConnection.connectCreate(DatabaseConnection.java:61) at org.apache.jena.tdb2.sys.DatabaseConnection.connectCreate(DatabaseConnection.java:52) at org.apache.jena.tdb2.DatabaseMgr.DB_ConnectCreate(DatabaseMgr.java:41) at org.apache.jena.tdb2.DatabaseMgr.connectDatasetGraph(DatabaseMgr.java:46) at org.apache.jena.tdb2.TDB2Factory.connectDataset(TDB2Factory.java:40) at tdb2.cmdline.ModTDBDataset.createDataset(ModTDBDataset.java:105) at arq.cmdline.ModDataset.getDataset(ModDataset.java:35) at arq.query.getDataset(query.java:179) at arq.query.queryExec(query.java:226) at arq.query.exec(query.java:157) at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:87) at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:56) at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:43) at tdb2.tdbquery.main(tdbquery.java:30) {noformat} was (Author: lorenzb): Not sure if opening a new issue would be better, but I guess we're not done here. We didn't recognize this because apparently I didn't know TDB2 assumes stats file in TDB2_LOCATION/DataXXX: Now that the stats are being loaded, the change to long values leads to additional parse errors during the reordering setup/application because there are still integer values assumed: {{ java.lang.NumberFormatException: For input string: "16666525095" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.base/java.lang.Integer.parseInt(Integer.java:652) at java.base/java.lang.Integer.parseInt(Integer.java:770) at org.apache.jena.sparql.sse.Item.asInteger(Item.java:275) at org.apache.jena.sparql.engine.optimizer.StatsMatcher.init(StatsMatcher.java:123) at org.apache.jena.sparql.engine.optimizer.StatsMatcher.<init>(StatsMatcher.java:97) at org.apache.jena.sparql.engine.optimizer.reorder.ReorderLib.weighted(ReorderLib.java:84) at org.apache.jena.tdb2.store.TDB2StorageBuilder.chooseReorderTransformation(TDB2StorageBuilder.java:352) at org.apache.jena.tdb2.store.TDB2StorageBuilder.build(TDB2StorageBuilder.java:112) at org.apache.jena.tdb2.sys.StoreConnection.make(StoreConnection.java:91) at org.apache.jena.tdb2.sys.StoreConnection.connectCreate(StoreConnection.java:59) at org.apache.jena.tdb2.sys.DatabaseOps.createSwitchable(DatabaseOps.java:100) at org.apache.jena.tdb2.sys.DatabaseOps.create(DatabaseOps.java:81) at org.apache.jena.tdb2.sys.DatabaseConnection.build(DatabaseConnection.java:101) at org.apache.jena.tdb2.sys.DatabaseConnection.lambda$make$0(DatabaseConnection.java:72) at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705) at org.apache.jena.tdb2.sys.DatabaseConnection.make(DatabaseConnection.java:72) at org.apache.jena.tdb2.sys.DatabaseConnection.connectCreate(DatabaseConnection.java:61) at org.apache.jena.tdb2.sys.DatabaseConnection.connectCreate(DatabaseConnection.java:52) at org.apache.jena.tdb2.DatabaseMgr.DB_ConnectCreate(DatabaseMgr.java:41) at org.apache.jena.tdb2.DatabaseMgr.connectDatasetGraph(DatabaseMgr.java:46) at org.apache.jena.tdb2.TDB2Factory.connectDataset(TDB2Factory.java:40) at tdb2.cmdline.ModTDBDataset.createDataset(ModTDBDataset.java:105) at arq.cmdline.ModDataset.getDataset(ModDataset.java:35) at arq.query.getDataset(query.java:179) at arq.query.queryExec(query.java:226) at arq.query.exec(query.java:157) at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:87) at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:56) at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:43) at tdb2.tdbquery.main(tdbquery.java:30) }} > TDB/TDB2 dataset size stat serialized incorrectly for large datasets > -------------------------------------------------------------------- > > Key: JENA-2225 > URL: https://issues.apache.org/jira/browse/JENA-2225 > Project: Apache Jena > Issue Type: Bug > Components: TDB, TDB2 > Affects Versions: Jena 4.3.1 > Reporter: Lorenz Bühmann > Assignee: Andy Seaborne > Priority: Minor > Fix For: Jena 4.4.0 > > > When computing the TDB/TDB2 stats via CLI the size will be serialized > incorrectly for large datasets. > For example for latest Wikidata Truthy we get > {noformat} > (count -1983667112)){noformat} > This happens because for both the corresponding `Stats.java` class does > enforce an Integer type Node though the value is a long type: > {code:java} > if ( count >= 0 ) > addPair(meta.getList(), StatsMatcher.COUNT, > NodeFactoryExtra.intToNode((int)count)) ; {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)