Re: Clerezza Yard setup and SPARQL

Rajan Shah Sat, 30 May 2015 14:15:54 -0700

Hi,

I can create Clerezza Yard successfully and query the data using SPARQL. Now,
when it comes to Named Entity Recognition the same issue persists.


I would appreciate, if someone can provide some insight or potential
resolution.

Thanks in advance,
Rajan

These are the steps I followed:

1. Uploaded relevant ontology to local ontonet

2. Created Managed Site, uploaded triples

3. Verified the data exists via SPARQL query:

<binding>
<result>
<binding name="ticker"><literal>AAPL</literal>
</binding><binding name="issuer"><literal>Apple Inc.</literal>
</binding><binding name="exchange"><literal>NASDAQ</literal></binding>
<binding name="currency"><literal>USD</literal>
</binding><binding name="instr">
<uri>http://finance.intellimind.io/secmaster/djia/AAPL</uri>
</binding>
</result>
</results></sparql>

4. Entityhub Linking

Assuming prefix imind being http://finance.intellimind.io/secmaster (so
that namespace prefix can be verified)

In the entityhub linking setup, within type mapping I am trying to map

a. Type Mapping Setup
imind:ticker > rdfs:label
imind:exchange > rdfs:label
...

b. Select "Case Sensitivity"


5. Chain setup

When included it in the list chain, it doesn't capture single entity
whereas it spent most of the time in this paricular chain.



   - *tika* ( optional , TikaEngine)
   - *langdetect* ( required , LanguageDetectionEnhancementEngine)
   - *opennlp-sentence* ( required , OpenNlpSentenceDetectionEngine)
   - *opennlp-token* ( required , OpenNlpTokenizerEngine)
   - *opennlp-pos* ( required , OpenNlpPosTaggingEngine)
   - *opennlp-ner* ( required , NamedEntityExtractionEnhancementEngine)
   - *refdata-linking* ( required , EntityLinkingEngine)
   -


*Sample Text:*

The Apple Inc. CEO Tim Cook spoke at dev conference. The Apple Inc. has
headquarter in US. It's ticker symbol is AAPL, which trades on NASDAQ.

On Mon, May 25, 2015 at 12:04 AM, Rajan Shah <[email protected]> wrote:

> Hi,
>
> In order to use Clerezza Yard setup, I tried very simple example outlined
> at the end.
>
> I would really appreciate, if someone can shed some light on
>
> a. Is there anything I am just completely missing here pertaining to
> "Named Graph" vs "Unions of Graphs" and reference? If that's the case,
> could you please clarify what would be relevant URI/IRI?
>
> b. What is the best way to debug such an issue? If SPARQL query fails,
> where should I see the logs indicating any issue as it doesn't appear in
> stdout logs?
>
> c. Is there any other simple alternative compare to this to achieve
> similar functionality? Is storing in Kiwi beneficial compared to this
> approach or do I have to have Apache Maramotta installed in order to use
> Kiwi?
>
> Thanks in advance,
> Rajan
>
>
> *1. Apache Stanbol Entityhub Yard: Clerezza Yard Configuration*
>
> Set following parameters
>
> ID: testYard
> Graph URI: http://test.io/ns/friends#
>
> *2. Setup Clerezza - SCB Jena TDB Storage Provider*
>
> Jena TDB directory: /<stanbol_dir>/<tdb_store>
> Default Graph Name: http://test.io/ns
> Weight: 105
>
> *3. Save the .ttl file into /<stanbol_dir>/<tdb_store>*
>
> @prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
> @prefix rdfa: <http://www.w3.org/ns/rdfa#> .
> @prefix friends: <http://test.io/ns/friends#> .
>
> <http://test.io/ns/friends#AndrewSmith> a vcard:Individual;
>                     vcard:fn "Andrew Smith";
>                     vcard:title "Founder";
>                     vcard:org "ABC LLC";
>                     vcard:orgunit "Startup";
>                     vcard:hasAddress [
>                                         a vcard:Work;
>                                         vcard:country-name "USA";
>                                         vcard:locality "New York";
>                                         vcard:region "New York"
>                     ] .
>
> *4. I do see that, upon startup, it creates necessary index files within *
> /<stanbol_dir>/<tdb_store>
> directory. In addition, within UI, it also registers following
> TripleCollections in SPARQL Endpoint
>
> http://test.io/ns/friends#
>
> *5. SPARQL Query*
> -- query1
> PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>
> PREFIX friends: <http://test.io/ns/friends#>
> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
>
> SELECT ?fn ?title ?org
> WHERE {
>   ?s vcard:fn ?fn ;
>     vcard:title ?title ;
>     vcard:org ?org .
> }
>
> OR
>
> -- query2
> PREFIX hmgr: <http://test.io/ns/friends#>
> PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>
>
> SELECT ?Individual ?title
> WHERE { ?title  vcard:title  "Founder" }
>
>
> *Observations:*
>
> The above queries work perfectly fine on either command-line or Jena Fuseki
> as follows
>
> a. tdbquery --loc /<stanbol_dir/<tdb_store> --query query1
> b. using fuseki user interface
>
> I tried couple alternatives such as GRAPH, NAMED, etc... however nothing
> helps. Is there any specific syntax need to be used for the SPARQL stanbol
> interface?
>
>
>
>

Re: Clerezza Yard setup and SPARQL

Reply via email to