[ https://issues.apache.org/jira/browse/TINKERPOP-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428432#comment-17428432 ]
ASF GitHub Bot commented on TINKERPOP-848: ------------------------------------------ amatiushkin commented on a change in pull request #1485: URL: https://github.com/apache/tinkerpop/pull/1485#discussion_r727717241 ########## File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java ########## @@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri case GraphMLTokens.DATA: final String key = reader.getAttributeValue(null, GraphMLTokens.KEY); final String dataAttributeName = keyIdMap.get(key); + final String defaultValue = defaultValues.get(key); if (dataAttributeName != null) { - final String value = reader.getElementText(); + String elementValue = reader.getElementText(); + final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue; Review comment: `getElementText()` internally creates a `new StringBuilder()` per every call and appends text (`getText()`) of a given element. Attempt to use non-text elements would through various exceptions. I have check multiple implementations of this method, and it seems concrete classes are null-safe. ########## File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java ########## @@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri case GraphMLTokens.DATA: final String key = reader.getAttributeValue(null, GraphMLTokens.KEY); final String dataAttributeName = keyIdMap.get(key); + final String defaultValue = defaultValues.get(key); if (dataAttributeName != null) { - final String value = reader.getElementText(); + String elementValue = reader.getElementText(); + final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue; Review comment: For cases without value like below, there is no `default` value therefore related events won't be emitted at all. ```xml <data key="d_n"/> ``` My implementation could generate NPE in such cases, because (`defaultValues.get(key)` && `defaultValue.length()`). I'll add tests for that and fix it. Btw, another use case with default tag alone without value should give _empty string_ by default. ``` <key id="d_n" for="node" attr.name="modification" attr.type="string"> <default/> </key> ``` I'll add it as well. ########## File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java ########## @@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri case GraphMLTokens.DATA: final String key = reader.getAttributeValue(null, GraphMLTokens.KEY); final String dataAttributeName = keyIdMap.get(key); + final String defaultValue = defaultValues.get(key); if (dataAttributeName != null) { - final String value = reader.getElementText(); + String elementValue = reader.getElementText(); + final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue; Review comment: For cases without value like below, there is no `default` value therefore related events won't be emitted at all. ```xml <data key="d_n"/> ``` My implementation could generate NPE in such cases, because (`defaultValues.get(key)` && `defaultValue.length()`). I'll add tests for that and fix it. Btw, another use case with default tag alone without value should give _empty string_ by default. ```xml <key id="d_n" for="node" attr.name="modification" attr.type="string"> <default/> </key> ``` I'll add it as well. ########## File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java ########## @@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri case GraphMLTokens.DATA: final String key = reader.getAttributeValue(null, GraphMLTokens.KEY); final String dataAttributeName = keyIdMap.get(key); + final String defaultValue = defaultValues.get(key); if (dataAttributeName != null) { - final String value = reader.getElementText(); + String elementValue = reader.getElementText(); + final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue; Review comment: This does not look right to me, key should be `d_n`. There is no key with id="modification", therefore it should be an invalid XML document to begin with. ```xml <node id="6315"> <data key="modification"/> </node> ``` That said, I will add this use cases as well just to see how it would perform. ########## File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java ########## @@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri case GraphMLTokens.DATA: final String key = reader.getAttributeValue(null, GraphMLTokens.KEY); final String dataAttributeName = keyIdMap.get(key); + final String defaultValue = defaultValues.get(key); if (dataAttributeName != null) { - final String value = reader.getElementText(); + String elementValue = reader.getElementText(); + final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue; Review comment: For cases without value like below, there is no `default` value therefore related events won't be emitted at all. ```xml <data key="d_n"/> ``` My implementation will generate **NPE** in such cases, because of `defaultValues.get(key)` && `defaultValue.length()`. I'll add tests for that and fix it. Btw, another use case with default tag alone without value should give _empty string_ by default. ```xml <key id="d_n" for="node" attr.name="modification" attr.type="string"> <default/> </key> ``` I'll add it as well. ########## File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java ########## @@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri case GraphMLTokens.DATA: final String key = reader.getAttributeValue(null, GraphMLTokens.KEY); final String dataAttributeName = keyIdMap.get(key); + final String defaultValue = defaultValues.get(key); if (dataAttributeName != null) { - final String value = reader.getElementText(); + String elementValue = reader.getElementText(); + final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue; Review comment: For cases without value like below, there is no `default` value therefore related events won't be emitted at all. ```xml <data key="d_n"/> ``` ⚠️ My implementation will generate **NPE** in such cases, because of `defaultValues.get(key)` && `defaultValue.length()`. I'll add tests for that and fix it. Btw, another use case with default tag alone without value should give _empty string_ by default. ```xml <key id="d_n" for="node" attr.name="modification" attr.type="string"> <default/> </key> ``` I'll add it as well. ########## File path: gremlin-core/src/test/resources/graphml/sample.graphml.xml ########## @@ -0,0 +1,367 @@ +<?xml version="1.0" encoding="UTF-8"?> Review comment: re: tinkerpop-modern.xml I also plan to add edge case with CDATA/PDATA once I found any good example. Does it make sense to make `gremlin-test` component to be a test dependency for `grenmlin-core`? I am not a big fun of duplicating things (including test), because it makes future contributor to be aware to update modern dataset in few more places. On the other hand, test dependency would add a little bit complexity to the build process itself. WDYT? ########## File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java ########## @@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri case GraphMLTokens.DATA: final String key = reader.getAttributeValue(null, GraphMLTokens.KEY); final String dataAttributeName = keyIdMap.get(key); + final String defaultValue = defaultValues.get(key); if (dataAttributeName != null) { - final String value = reader.getElementText(); + String elementValue = reader.getElementText(); + final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue; Review comment: `getElementText()` internally creates a `new StringBuilder()` per every call and appends text (`getText()`) of a given element. Attempt to use non-text elements would throw various exceptions. I have check multiple implementations of this method, and it seems concrete classes are null-safe. ########## File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java ########## @@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri case GraphMLTokens.DATA: final String key = reader.getAttributeValue(null, GraphMLTokens.KEY); final String dataAttributeName = keyIdMap.get(key); + final String defaultValue = defaultValues.get(key); if (dataAttributeName != null) { - final String value = reader.getElementText(); + String elementValue = reader.getElementText(); + final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue; Review comment: `getElementText()` internally creates a `new StringBuilder()` per every call and appends text (`getText()`) of a given element. Attempt to use non-text elements would generate various exceptions. I have check multiple implementations of this method, and it seems concrete classes are null-safe. ########## File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLReader.java ########## @@ -138,9 +143,11 @@ public void readGraph(final InputStream graphInputStream, final Graph graphToWri case GraphMLTokens.DATA: final String key = reader.getAttributeValue(null, GraphMLTokens.KEY); final String dataAttributeName = keyIdMap.get(key); + final String defaultValue = defaultValues.get(key); if (dataAttributeName != null) { - final String value = reader.getElementText(); + String elementValue = reader.getElementText(); + final String value = elementValue.length() == 0 && defaultValue.length() != 0 ? defaultValue : elementValue; Review comment: `getElementText()` internally creates a `new StringBuilder()` per every call and appends text (`getText()`) of a given element. Attempt to use non-text elements would generate various exceptions. I have check multiple implementations of this method, and it seems concrete classes are null-safe for this method, e.g. it always returns string. ########## File path: gremlin-core/src/test/resources/graphml/sample.graphml.xml ########## @@ -0,0 +1,367 @@ +<?xml version="1.0" encoding="UTF-8"?> Review comment: re: tinkerpop-modern.xml I also plan to add edge case with CDATA/PDATA once I find any good example. Does it make sense to make `gremlin-test` component to be a test dependency for `grenmlin-core`? I am not a big fun of duplicating things (including test), because it makes future contributor to be aware to update modern dataset in few more places. On the other hand, test dependency would add a little bit complexity to the build process itself. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tinkerpop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support default attribute values in GraphMLReader > ------------------------------------------------- > > Key: TINKERPOP-848 > URL: https://issues.apache.org/jira/browse/TINKERPOP-848 > Project: TinkerPop > Issue Type: Improvement > Components: io > Affects Versions: 3.0.2-incubating > Reporter: Pavel Klinov > Priority: Trivial > Original Estimate: 2h > Remaining Estimate: 2h > > Looking at the code of GraphMLReader I see that it doesn't support default > values of attributes, which are allowed by the GraphML spec. This is a bit > annoying especially if the input defines default values for attributes which > are used for mandatory data, e.g. edge labels. > One small example is the sample graph at [1]. "d_e" is the label attribute > with a default value. There're <edge .. /> elements w/o body later in the > document and reading those will throw a "java.lang.IllegalArgumentException: > Label can not be null" exception (if the vendor considers edge labels > mandatory). > I'd personaly squash both keyIdMap and keyTypesMap into a single String -> > AttrInfo map, where AttrInfo would contain information about the data > attribute name, type, and the default value. > [1] http://www.eecs.wsu.edu/~yyao/DirectedStudyI/Datasets/AS/sample.graphml -- This message was sent by Atlassian Jira (v8.3.4#803005)