[ 
https://issues.apache.org/jira/browse/JENA-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409415#comment-16409415
 ] 

Vladimir Alexiev commented on JENA-1505:
----------------------------------------

I'm no programmer but will see if someone at Ontotext can do it. Should be easy 
by copying the strSplit files.
 * Looking at  
[strSplit.java|https://github.com/apache/jena/blob/eb4b5b6893c1fe9647251167e79ab082c892f28a/jena-arq/src/main/java/org/apache/jena/sparql/pfunction/library/strSplit.java]
 everything seems clear: use argSubject.getArg(0) and argSubject.getArg(1) to 
bind ?index and ?value
 * But looking at 
[TestStrSplit.java|https://github.com/apache/jena/blob/eb4b5b6893c1fe9647251167e79ab082c892f28a/jena-arq/src/test/java/org/apache/jena/sparql/pfunction/library/TestStrSplit.java],
 we'll need a new function to check *two* bindings in the resultset, say 
assertAllXY. Where is assertAllX defined? I can only 
[find|https://github.com/apache/jena/search?utf8=%E2%9C%93&q=assertallx&type=] 
its use in TestStrSplit

> add function apf:strIndexSplit
> ------------------------------
>
>                 Key: JENA-1505
>                 URL: https://issues.apache.org/jira/browse/JENA-1505
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Vladimir Alexiev
>            Priority: Major
>
> We use Tarql to convert some company CSV data to RDF.
>  We had cases of multiple values in a field (eg aliases) that we handle with 
> apf:strSplit.
> But now we've hit another case: several multi-value fields arranged in 
> parallel arrays.
>  Each CSV row is a Joint Venture (?jvId, ?jvName) and there are 3 
> newline-separated parallel arrays that describe the participant companies: 
> ?coIds, ?coNames, ?coIndustries.
>  If we use several apf:strSplit in one query, that will cause a Cartesian 
> product, and mix up all company ids, names, industries together.
> Tarql allows multiple CONSTRUCT queries in one script, and "the triples 
> generated by previous CONSTRUCT clauses can be queries in subsequent WHERE 
> clauses to retrieve additional data". So my idea is to split each column in a 
> separate CONSTRUCT, attach the values to temporary nodes, and reassemble them 
> in a final CONSTRUCT.
> But we can't do this with apf:strSplit, since it loses the index (ordering) 
> of the individual values.
>  We need a new Jena ARQ function, eg with a signature like this where ? 
> indicates unbound and $indicates bound:
> {noformat}
> (?index ?value) apf:strIndexSplit ($string $separator)
> Splits $string on regex $separator and produces a number of binding pairs
> where ?index is bound to a sequential number (starting from 1)
> and ?value is bound to the consecutive string part that is split off.
> {noformat}
> Then we could hack the problem with something like this:
> {noformat}
> construct { # get first multiValue field
>  ?ROW tmp:coIds [tmp:index ?INDEX; tmp:value ?VALUE]
> } where {
>  bind(uri("urn:tmp:",?ROWNUM) as ?ROW)
>  (?INDEX ?VALUE) apf:strIndexSplit (?coIds, "\\n")
> }
> construct { # get second multiValue field
>  ?ROW tmp:coNames [tmp:index ?INDEX; tmp:value ?VALUE]
> } where {
>  bind(uri("urn:tmp:",?ROWNUM) as ?ROW)
>  (?INDEX ?VALUE) apf:strIndexSplit (?coNames, "\\n")
> }
> construct { # get third multiValue field
>  ?ROW tmp:coIndustries [tmp:index ?INDEX; tmp:value ?VALUE]
> } where {
>  bind(uri("urn:tmp:",?ROWNUM) as ?ROW)
>  (?INDEX ?VALUE) apf:strIndexSplit (?coIndustries, "\\n")
> }
> construct { # make JV node
>  ?JV ex:id ?jvId; ex:name ?jvName.
> } where {
>  bind(uri(concat("jv/",?jvId) as ?JV))
> }
> construct { # make Company node and relation
>  ?CO ex:id ?coId; ex:name ?coName; ex:industry ?INDUSTRY.
>  ?JV ex:hasParticipant ?CO
> } where {
>  bind(uri(concat("jv/",?jvId) as ?JV))
>  bind(uri(concat("urn:tmp:",?ROWNUM) as ?ROW))
>            ?ROW tmp:coIds        [tmp:index ?INDEX; tmp:value ?coId]
>  optional {?ROW tmp:coNames      [tmp:index ?INDEX; tmp:value ?coName]}
>  optional {?ROW tmp:coIndustries [tmp:index ?INDEX; tmp:value ?coIndustry]}
>  bind(uri(concat("company/",?coId) as ?CO)
>  bind(uri(concat("industry/",?coIndustry) as ?INDUSTRY)
> }
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to