[ 
https://issues.apache.org/jira/browse/RYA-43?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15854404#comment-15854404
 ] 

David W. Lotts commented on RYA-43:
-----------------------------------

Here is a reasonable (great?) SOLUTION! 
Making it backward compatible is why this has not been fixed yet.  This 
solution accomplishes backward compatible by mixing the encoding from 
LexiTypeEncoders.bigIntegerEncoder() with the existing 
LexiTypeEncoders.integerEncoder().  

Here is an idea to make this work backwardly compatible.  It should not break 
existing Rya repositories:
Encode the java sized integers as-is, then for anything out of range, use 
MAX/MIN Integer and concatenate the new big integer encoding.

Pros: Regular integers are unencumbered.
Cons:
The only disadvantage I see is that every large integer literal stored will 
have an extra 8 bytes.  

Here is the current way of encoding returning a string in class :   
org.apache.rya.api.resolver.impl.IntegerRyaTypeResolver
            return INTEGER_STRING_TYPE_ENCODER.encode(Integer.parseInt(data));

Here is my replacement:

if  (value >= Integer.MAX) { //  value is a string, fix this with parseint() 
and catch or similar
    return INTEGER_STRING_TYPE_ENCODER.encode(Integer.MAX) +  
LexiTypeEncoders.bigIntegerEncoder(value) ;
} else if (value <= Integer.MIN) {  // fix this also as above.
    return INTEGER_STRING_TYPE_ENCODER.encode(Integer.MIN)  + 
LexiTypeEncoders.bigIntegerEncoder(value) ;
} else {
            return INTEGER_STRING_TYPE_ENCODER.encode(Integer.parseInt(data));
}

That's it!
You need to figure out a good way to do the comparison before converting from a 
String.  Probably using the exception catch makes sense.  Also deserialize 
needs to be coded in reverse.

david.

> NumberFormatException for large integers
> ----------------------------------------
>
>                 Key: RYA-43
>                 URL: https://issues.apache.org/jira/browse/RYA-43
>             Project: Rya
>          Issue Type: Bug
>    Affects Versions: 3.2.10
>            Reporter: Jesse Hatfield
>            Assignee: David W. Lotts
>         Attachments: integer
>
>
> Attempting to insert a value with datatype {{xsd:integer}} and value outside 
> the range of a Java int will fail with an exception.
> It looks like Rya resolves any {{xsd:integer}} as an int, whereas the 
> [XMLSchema specification|https://www.w3.org/TR/xmlschema11-2/#integer] 
> defines {{xsd:integer}} as the infinite set of all integers (with subsets 
> {{xsd:long}} and {{xsd:int}} having bounded range). Therefore we fail to 
> parse what should be a valid triple.
> Example input:
> {code}<http://dbpedia.org/resource/Pseudohypoaldosteronism> 
> <http://dbpedia.org/ontology/omim> 
> "9223372036854775807"^^<http://www.w3.org/2001/XMLSchema#integer> .{code}
> Result:
> {code}
> $ hadoop jar accumulo.rya-3.2.10-SNAPSHOT-shaded.jar 
> mvm.rya.accumulo.mr.fileinput.RdfFileInputTool -conf conf.xml 
> -Drdf.tablePrefix=int_bug_ -Drdf.format=N-Triples /input/integer.nt
> [...]
> Error: java.io.IOException: 
> mvm.rya.api.resolver.triple.TripleRowResolverException: 
> mvm.rya.api.resolver.RyaTypeResolverException: Exception occurred serializing 
> data[9223372036854775807]
>       at 
> mvm.rya.accumulo.RyaTableMutationsFactory.serialize(RyaTableMutationsFactory.java:75)
>       at 
> mvm.rya.accumulo.mr.fileinput.RdfFileInputTool$StatementToMutationMapper.map(RdfFileInputTool.java:157)
>       at 
> mvm.rya.accumulo.mr.fileinput.RdfFileInputTool$StatementToMutationMapper.map(RdfFileInputTool.java:124)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: mvm.rya.api.resolver.triple.TripleRowResolverException: 
> mvm.rya.api.resolver.RyaTypeResolverException: Exception occurred serializing 
> data[9223372036854775807]
>       at 
> mvm.rya.api.resolver.triple.impl.WholeRowTripleResolver.serialize(WholeRowTripleResolver.java:82)
>       at 
> mvm.rya.api.resolver.RyaTripleContext.serializeTriple(RyaTripleContext.java:85)
>       at 
> mvm.rya.accumulo.RyaTableMutationsFactory.serialize(RyaTableMutationsFactory.java:67)
>       ... 10 more
> Caused by: mvm.rya.api.resolver.RyaTypeResolverException: Exception occurred 
> serializing data[9223372036854775807]
>       at 
> mvm.rya.api.resolver.impl.IntegerRyaTypeResolver.serializeData(IntegerRyaTypeResolver.java:50)
>       at 
> mvm.rya.api.resolver.impl.RyaTypeResolverImpl.serializeType(RyaTypeResolverImpl.java:82)
>       at mvm.rya.api.resolver.RyaContext.serializeType(RyaContext.java:121)
>       at 
> mvm.rya.api.resolver.triple.impl.WholeRowTripleResolver.serialize(WholeRowTripleResolver.java:64)
>       ... 12 more
> Caused by: java.lang.NumberFormatException: For input string: 
> "9223372036854775807"
>       at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>       at java.lang.Integer.parseInt(Integer.java:583)
>       at java.lang.Integer.parseInt(Integer.java:615)
>       at 
> mvm.rya.api.resolver.impl.IntegerRyaTypeResolver.serializeData(IntegerRyaTypeResolver.java:48)
>       ... 15 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to