Re: Clerezza Yard setup and SPARQL

Rajan Shah Tue, 09 Jun 2015 03:52:40 -0700

Hi Rupert,

Thanks a lot for the clarification!


It makes sense.

With best regards,
Rajan

On Mon, Jun 8, 2015 at 3:50 AM, Rupert Westenthaler <
rupert.westentha...@gmail.com> wrote:

> Hi,
>
> The SolrYard does not support BNodes and the VCard RDF tends to use those.
>
> If you use the Entityhub Indexing Tool for importing the data you can
> try to set the "bnode-prefix" for the rdf indexing source (see
> STANBOL-765 [1] for details)
>
> best
> Rupert
>
> [1] https://issues.apache.org/jira/browse/STANBOL-765
>
> On Tue, Jun 2, 2015 at 6:13 PM, Rajan Shah <raja...@gmail.com> wrote:
> > Hi Rupert,
> >
> > Thanks again for the response.
> >
> > At present, it's just an observation that mainly with vcard I had issue
> > with queries. At the same time, I could get results with either custom
> > entities or even foaf.
> >
> > I will keep an eye on it and if observe it again, will submit JIRA issue.
> >
> > With best regards,
> > Rajan
> >
> >
> >
> > On Tue, Jun 2, 2015 at 11:03 AM, Rupert Westenthaler <
> > rupert.westentha...@gmail.com> wrote:
> >
> >> Hi Rajan,
> >>
> >> Sorry I do not have enough time for a detailed answer. But the
> >> baseline is. EntityLinking does not work with the Clerezza Yard. Even
> >> if you would not encounter errors both performance and results would
> >> be much worse as with a SolrYard. This is because EntityLinking
> >> depends on features that are Solr Exclusive (e.g. the Solr Analyzers
> >> doing Stemming ... and the ranking of query results).
> >>
> >> If you find failing SPARQL queries in the log feel free to report as
> >> Issues in Jira. I will have a look.
> >>
> >> best
> >> Rupert
> >>
> >> On Sat, May 30, 2015 at 11:14 PM, Rajan Shah <raja...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > I can create Clerezza Yard successfully and query the data using
> SPARQL.
> >> Now,
> >> > when it comes to Named Entity Recognition the same issue persists.
> >> >
> >> > I would appreciate, if someone can provide some insight or potential
> >> > resolution.
> >> >
> >> > Thanks in advance,
> >> > Rajan
> >> >
> >> > These are the steps I followed:
> >> >
> >> > 1. Uploaded relevant ontology to local ontonet
> >> >
> >> > 2. Created Managed Site, uploaded triples
> >> >
> >> > 3. Verified the data exists via SPARQL query:
> >> >
> >> > <binding>
> >> > <result>
> >> > <binding name="ticker"><literal>AAPL</literal>
> >> > </binding><binding name="issuer"><literal>Apple Inc.</literal>
> >> > </binding><binding name="exchange"><literal>NASDAQ</literal></binding>
> >> > <binding name="currency"><literal>USD</literal>
> >> > </binding><binding name="instr">
> >> > <uri>http://finance.intellimind.io/secmaster/djia/AAPL</uri>
> >> > </binding>
> >> > </result>
> >> > </results></sparql>
> >> >
> >> > 4. Entityhub Linking
> >> >
> >> > Assuming prefix imind being http://finance.intellimind.io/secmaster
> (so
> >> > that namespace prefix can be verified)
> >> >
> >> > In the entityhub linking setup, within type mapping I am trying to map
> >> >
> >> > a. Type Mapping Setup
> >> > imind:ticker > rdfs:label
> >> > imind:exchange > rdfs:label
> >> > ...
> >> >
> >> > b. Select "Case Sensitivity"
> >> >
> >> >
> >> > 5. Chain setup
> >> >
> >> > When included it in the list chain, it doesn't capture single entity
> >> > whereas it spent most of the time in this paricular chain.
> >> >
> >> >
> >> >
> >> >    - *tika* ( optional , TikaEngine)
> >> >    - *langdetect* ( required , LanguageDetectionEnhancementEngine)
> >> >    - *opennlp-sentence* ( required , OpenNlpSentenceDetectionEngine)
> >> >    - *opennlp-token* ( required , OpenNlpTokenizerEngine)
> >> >    - *opennlp-pos* ( required , OpenNlpPosTaggingEngine)
> >> >    - *opennlp-ner* ( required ,
> NamedEntityExtractionEnhancementEngine)
> >> >    - *refdata-linking* ( required , EntityLinkingEngine)
> >> >    -
> >> >
> >> >
> >> > *Sample Text:*
> >> >
> >> > The Apple Inc. CEO Tim Cook spoke at dev conference. The Apple Inc.
> has
> >> > headquarter in US. It's ticker symbol is AAPL, which trades on NASDAQ.
> >> >
> >> > On Mon, May 25, 2015 at 12:04 AM, Rajan Shah <raja...@gmail.com>
> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> In order to use Clerezza Yard setup, I tried very simple example
> >> outlined
> >> >> at the end.
> >> >>
> >> >> I would really appreciate, if someone can shed some light on
> >> >>
> >> >> a. Is there anything I am just completely missing here pertaining to
> >> >> "Named Graph" vs "Unions of Graphs" and reference? If that's the
> case,
> >> >> could you please clarify what would be relevant URI/IRI?
> >> >>
> >> >> b. What is the best way to debug such an issue? If SPARQL query
> fails,
> >> >> where should I see the logs indicating any issue as it doesn't
> appear in
> >> >> stdout logs?
> >> >>
> >> >> c. Is there any other simple alternative compare to this to achieve
> >> >> similar functionality? Is storing in Kiwi beneficial compared to this
> >> >> approach or do I have to have Apache Maramotta installed in order to
> use
> >> >> Kiwi?
> >> >>
> >> >> Thanks in advance,
> >> >> Rajan
> >> >>
> >> >>
> >> >> *1. Apache Stanbol Entityhub Yard: Clerezza Yard Configuration*
> >> >>
> >> >> Set following parameters
> >> >>
> >> >> ID: testYard
> >> >> Graph URI: http://test.io/ns/friends#
> >> >>
> >> >> *2. Setup Clerezza - SCB Jena TDB Storage Provider*
> >> >>
> >> >> Jena TDB directory: /<stanbol_dir>/<tdb_store>
> >> >> Default Graph Name: http://test.io/ns
> >> >> Weight: 105
> >> >>
> >> >> *3. Save the .ttl file into /<stanbol_dir>/<tdb_store>*
> >> >>
> >> >> @prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
> >> >> @prefix rdfa: <http://www.w3.org/ns/rdfa#> .
> >> >> @prefix friends: <http://test.io/ns/friends#> .
> >> >>
> >> >> <http://test.io/ns/friends#AndrewSmith> a vcard:Individual;
> >> >>                     vcard:fn "Andrew Smith";
> >> >>                     vcard:title "Founder";
> >> >>                     vcard:org "ABC LLC";
> >> >>                     vcard:orgunit "Startup";
> >> >>                     vcard:hasAddress [
> >> >>                                         a vcard:Work;
> >> >>                                         vcard:country-name "USA";
> >> >>                                         vcard:locality "New York";
> >> >>                                         vcard:region "New York"
> >> >>                     ] .
> >> >>
> >> >> *4. I do see that, upon startup, it creates necessary index files
> >> within *
> >> >> /<stanbol_dir>/<tdb_store>
> >> >> directory. In addition, within UI, it also registers following
> >> >> TripleCollections in SPARQL Endpoint
> >> >>
> >> >> http://test.io/ns/friends#
> >> >>
> >> >> *5. SPARQL Query*
> >> >> -- query1
> >> >> PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>
> >> >> PREFIX friends: <http://test.io/ns/friends#>
> >> >> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> >> >>
> >> >> SELECT ?fn ?title ?org
> >> >> WHERE {
> >> >>   ?s vcard:fn ?fn ;
> >> >>     vcard:title ?title ;
> >> >>     vcard:org ?org .
> >> >> }
> >> >>
> >> >> OR
> >> >>
> >> >> -- query2
> >> >> PREFIX hmgr: <http://test.io/ns/friends#>
> >> >> PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>
> >> >>
> >> >> SELECT ?Individual ?title
> >> >> WHERE { ?title  vcard:title  "Founder" }
> >> >>
> >> >>
> >> >> *Observations:*
> >> >>
> >> >> The above queries work perfectly fine on either command-line or Jena
> >> Fuseki
> >> >> as follows
> >> >>
> >> >> a. tdbquery --loc /<stanbol_dir/<tdb_store> --query query1
> >> >> b. using fuseki user interface
> >> >>
> >> >> I tried couple alternatives such as GRAPH, NAMED, etc... however
> nothing
> >> >> helps. Is there any specific syntax need to be used for the SPARQL
> >> stanbol
> >> >> interface?
> >> >>
> >> >>
> >> >>
> >> >>
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westentha...@gmail.com
> >> | Bodenlehenstraße 11                              ++43-699-11108907
> >> | A-5500 Bischofshofen
> >> | REDLINK.CO
> >>
> ..........................................................................
> >> | http://redlink.co/
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westentha...@gmail.com
> | Bodenlehenstraße 11                              ++43-699-11108907
> | A-5500 Bischofshofen
> | REDLINK.CO
> ..........................................................................
> | http://redlink.co/
>

Re: Clerezza Yard setup and SPARQL

Reply via email to