[
https://issues.apache.org/jira/browse/JENA-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409415#comment-16409415
]
Vladimir Alexiev commented on JENA-1505:
----------------------------------------
I'm no programmer but will see if someone at Ontotext can do it. Should be easy
by copying the strSplit files.
* Looking at
[strSplit.java|https://github.com/apache/jena/blob/eb4b5b6893c1fe9647251167e79ab082c892f28a/jena-arq/src/main/java/org/apache/jena/sparql/pfunction/library/strSplit.java]
everything seems clear: use argSubject.getArg(0) and argSubject.getArg(1) to
bind ?index and ?value
* But looking at
[TestStrSplit.java|https://github.com/apache/jena/blob/eb4b5b6893c1fe9647251167e79ab082c892f28a/jena-arq/src/test/java/org/apache/jena/sparql/pfunction/library/TestStrSplit.java],
we'll need a new function to check *two* bindings in the resultset, say
assertAllXY. Where is assertAllX defined? I can only
[find|https://github.com/apache/jena/search?utf8=%E2%9C%93&q=assertallx&type=]
its use in TestStrSplit
> add function apf:strIndexSplit
> ------------------------------
>
> Key: JENA-1505
> URL: https://issues.apache.org/jira/browse/JENA-1505
> Project: Apache Jena
> Issue Type: Improvement
> Components: ARQ
> Reporter: Vladimir Alexiev
> Priority: Major
>
> We use Tarql to convert some company CSV data to RDF.
> We had cases of multiple values in a field (eg aliases) that we handle with
> apf:strSplit.
> But now we've hit another case: several multi-value fields arranged in
> parallel arrays.
> Each CSV row is a Joint Venture (?jvId, ?jvName) and there are 3
> newline-separated parallel arrays that describe the participant companies:
> ?coIds, ?coNames, ?coIndustries.
> If we use several apf:strSplit in one query, that will cause a Cartesian
> product, and mix up all company ids, names, industries together.
> Tarql allows multiple CONSTRUCT queries in one script, and "the triples
> generated by previous CONSTRUCT clauses can be queries in subsequent WHERE
> clauses to retrieve additional data". So my idea is to split each column in a
> separate CONSTRUCT, attach the values to temporary nodes, and reassemble them
> in a final CONSTRUCT.
> But we can't do this with apf:strSplit, since it loses the index (ordering)
> of the individual values.
> We need a new Jena ARQ function, eg with a signature like this where ?
> indicates unbound and $indicates bound:
> {noformat}
> (?index ?value) apf:strIndexSplit ($string $separator)
> Splits $string on regex $separator and produces a number of binding pairs
> where ?index is bound to a sequential number (starting from 1)
> and ?value is bound to the consecutive string part that is split off.
> {noformat}
> Then we could hack the problem with something like this:
> {noformat}
> construct { # get first multiValue field
> ?ROW tmp:coIds [tmp:index ?INDEX; tmp:value ?VALUE]
> } where {
> bind(uri("urn:tmp:",?ROWNUM) as ?ROW)
> (?INDEX ?VALUE) apf:strIndexSplit (?coIds, "\\n")
> }
> construct { # get second multiValue field
> ?ROW tmp:coNames [tmp:index ?INDEX; tmp:value ?VALUE]
> } where {
> bind(uri("urn:tmp:",?ROWNUM) as ?ROW)
> (?INDEX ?VALUE) apf:strIndexSplit (?coNames, "\\n")
> }
> construct { # get third multiValue field
> ?ROW tmp:coIndustries [tmp:index ?INDEX; tmp:value ?VALUE]
> } where {
> bind(uri("urn:tmp:",?ROWNUM) as ?ROW)
> (?INDEX ?VALUE) apf:strIndexSplit (?coIndustries, "\\n")
> }
> construct { # make JV node
> ?JV ex:id ?jvId; ex:name ?jvName.
> } where {
> bind(uri(concat("jv/",?jvId) as ?JV))
> }
> construct { # make Company node and relation
> ?CO ex:id ?coId; ex:name ?coName; ex:industry ?INDUSTRY.
> ?JV ex:hasParticipant ?CO
> } where {
> bind(uri(concat("jv/",?jvId) as ?JV))
> bind(uri(concat("urn:tmp:",?ROWNUM) as ?ROW))
> ?ROW tmp:coIds [tmp:index ?INDEX; tmp:value ?coId]
> optional {?ROW tmp:coNames [tmp:index ?INDEX; tmp:value ?coName]}
> optional {?ROW tmp:coIndustries [tmp:index ?INDEX; tmp:value ?coIndustry]}
> bind(uri(concat("company/",?coId) as ?CO)
> bind(uri(concat("industry/",?coIndustry) as ?INDUSTRY)
> }
> {noformat}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)