[ https://issues.apache.org/jira/browse/JENA-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409460#comment-16409460 ]
Vladimir Alexiev commented on JENA-1505: ---------------------------------------- > Is there a better way to do this? I don't know how to clean up the "urn:tmp:" >triples... [~andy.seaborne] if you look at the case description, it would be much nicer if I can use something like this: {noformat} construct { # bindings returned only in first row ?JV ex:id ?jvId; ex:name ?jvName. # bindings returned in first and subsequent rows ?CO ex:id ?coId; ex:name ?coName; ex:industry ?INDUSTRY. ?JV ex:hasParticipant ?CO } where { ?coId apf:strSplitParallel (?coIds, "\\n") ?coName apf:strSplitParallel (?coNames, "\\n") ?coIndustry apf:strSplitParallel (?coIndustries, "\\n") bind(uri(concat("jv/",?jvId) as ?JV)) bind(uri(concat("company/",?coId) as ?CO) bind(uri(concat("industry/",?coIndustry) as ?INDUSTRY) }{noformat} But is something like this even possible in SPARQL algebra? > add function apf:strIndexSplit > ------------------------------ > > Key: JENA-1505 > URL: https://issues.apache.org/jira/browse/JENA-1505 > Project: Apache Jena > Issue Type: Improvement > Components: ARQ > Reporter: Vladimir Alexiev > Priority: Major > > We use Tarql to convert some company CSV data to RDF. > We had cases of multiple values in a field (eg aliases) that we handle with > apf:strSplit. > But now we've hit another case: several multi-value fields arranged in > parallel arrays. > Each CSV row is a Joint Venture (?jvId, ?jvName) and there are 3 > newline-separated parallel arrays that describe the participant companies: > ?coIds, ?coNames, ?coIndustries. > If we use several apf:strSplit in one query, that will cause a Cartesian > product, and mix up all company ids, names, industries together. > Tarql allows multiple CONSTRUCT queries in one script, and "the triples > generated by previous CONSTRUCT clauses can be queries in subsequent WHERE > clauses to retrieve additional data". So my idea is to split each column in a > separate CONSTRUCT, attach the values to temporary nodes, and reassemble them > in a final CONSTRUCT. > But we can't do this with apf:strSplit, since it loses the index (ordering) > of the individual values. > We need a new Jena ARQ function, eg with a signature like this where ? > indicates unbound and $indicates bound: > {noformat} > (?index ?value) apf:strIndexSplit ($string $separator) > Splits $string on regex $separator and produces a number of binding pairs > where ?index is bound to a sequential number (starting from 1) > and ?value is bound to the consecutive string part that is split off. > {noformat} > Then we could hack the problem with something like this: > {noformat} > construct { # get first multiValue field > ?ROW tmp:coIds [tmp:index ?INDEX; tmp:value ?VALUE] > } where { > bind(uri("urn:tmp:",?ROWNUM) as ?ROW) > (?INDEX ?VALUE) apf:strIndexSplit (?coIds, "\\n") > } > construct { # get second multiValue field > ?ROW tmp:coNames [tmp:index ?INDEX; tmp:value ?VALUE] > } where { > bind(uri("urn:tmp:",?ROWNUM) as ?ROW) > (?INDEX ?VALUE) apf:strIndexSplit (?coNames, "\\n") > } > construct { # get third multiValue field > ?ROW tmp:coIndustries [tmp:index ?INDEX; tmp:value ?VALUE] > } where { > bind(uri("urn:tmp:",?ROWNUM) as ?ROW) > (?INDEX ?VALUE) apf:strIndexSplit (?coIndustries, "\\n") > } > construct { # make JV node > ?JV ex:id ?jvId; ex:name ?jvName. > } where { > bind(uri(concat("jv/",?jvId) as ?JV)) > } > construct { # make Company node and relation > ?CO ex:id ?coId; ex:name ?coName; ex:industry ?INDUSTRY. > ?JV ex:hasParticipant ?CO > } where { > bind(uri(concat("jv/",?jvId) as ?JV)) > bind(uri(concat("urn:tmp:",?ROWNUM) as ?ROW)) > ?ROW tmp:coIds [tmp:index ?INDEX; tmp:value ?coId] > optional {?ROW tmp:coNames [tmp:index ?INDEX; tmp:value ?coName]} > optional {?ROW tmp:coIndustries [tmp:index ?INDEX; tmp:value ?coIndustry]} > bind(uri(concat("company/",?coId) as ?CO) > bind(uri(concat("industry/",?coIndustry) as ?INDUSTRY) > } > {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)