[Virtuoso-users] (no subject)

2016-01-14 Thread Root User

--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


[Virtuoso-users] Not Giving up on HTML5/RDFa Import

2016-01-14 Thread Haag, Jason
Hi All,

I"m back again evaluating Virtuoso for the HTML5/RDFa crawling capability.
We are considering moving to the Universal Server from VOS if I can ever
prove to my team that it will be a good choice for sponging and crawling
HTML5/RDFa files. I have been testing this feature periodically over the
past several months with no luck. I appreciate the support and feedback so
far, but I haven't made any progress. Some previous posts/inquiries I made
on this topic are available here:
http://sourceforge.net/p/virtuoso/mailman/message/34507072/ and here:
http://sourceforge.net/p/virtuoso/mailman/virtuoso-users/thread/CAHjqjnLo7-hiA30neYBsbGm93HeXe%3DHrda5rZPGS%3Dwm%2B08ZvBw%40mail.gmail.com/#msg34525370

I would really like to use the conductor interface to regularly schedule
the import several graph IRIs that contain RDFa and check the triples for
any additions on daily basis. I recently upgraded the installation to VOS
7.2.3 and still can't see to get the RDFa data to populate the data store.
After I run the import from the que, anytime I query the virtuoso database
there is no data from my RDFa datasets that I have imported through
conductor. I must be doing something wrong or missing an important step
somewhere. However, if I use these same exact RDFa IRIs using the isql-v
function (DB.DBA.RDF_LOAD_RDFA) the triples load successfully.

Here's a summary of what I've done and discovered so far:

1) Installed VOS 7.2.3 successfully
2) Read some of the newly updated documentation, which is excellent by the
way
3) Checked/updated sponger priveledges per this guidance for securing the
endpoint:
http://docs.openlinksw.com/virtuoso/rdfsparql.html#rdfsupportedprotocolendpointuri
4) Installed cartridges_dav.vad from commercial version (for sponger
cartridges):
http://opldownload.s3.amazonaws.com/uda/vad-packages/7.2/cartridges_dav.vad
5) Checked and configured xHTML /  aka HTML5 (and variants) cartridge under
“extractor cartridges” with the following settings (per advice from the
mailing list/forums):

Pattern: (application/xhtml.xml)|(text|application)/.*(html|xml)
fallback-mode=no
rdfa=yes
reify_html5md=1
reify_rdfa=0
reify_jsonld=0
reify_all_grddl=0
passthrough_mode=yes
loose=yes
reify_html=0
reify_html_misc=0
reify_turtle=no


I also tried this basic configuration as well:
add-html-meta=no
get-feeds=no
rdfa=yes
fallback-mode=no
reify_html=no
reify_html_misc=no
reify_html5md=no
reify_rdfa=no
reify_jsonld=no
reify_turtle=no
reify_all_grddl=no
passthrough_mode=no
loose=no

6) Created a content import for the HTML5/RDFa document using conductor
with the following options:

Target URL: http://xapi.vocab.pub/datasets/adl/verbs
login owner: dba
checked the following

   - store documents locally
   - run sponger
   - store metadata (selected xHTML aka HTML5 and variants)


7) Run the import, and 0/1 pages/sites were retrieved and looked up the
error to be: "XM003: XML parser detected an error: ERROR : Tag nesting
error: name 'head' of end tag does not match the name 'link' of start tag
at line 19 column 108 at line 20 column 9 of source text  ---^"
8) This appears to be a validation error looking for closing tags of the
 and  elements. It appears the content import isn't checking my
doctype declaration.  HTML5 doesn't need to close the  or 
elements whereas xhtml does.
9) Updated the HTML5 to close the meta and link tags to work around this to
see if the error would go away. It did!
10 Created a new import targeting the updated HTML5 with closing tags. This
time,  no errors and one 1 site was retrieved successfully (
http://xapi.vocab.pub/datasets/adl/verbs)
11) Check to see if the named graph and triples populated the database.
Nothing there.
SPARQL SELECT DISTINCT ?g WHERE {GRAPH ?g { ?s ?p ?o . }}

Here are some strange things I noticed that could be causing issues. Not
sure if anyone can explain what's happening here.

   - Even though the content type is text/html and explicitly defined as
   such in the HTML metatag, the file is being stored in webdav as the
   "application/xhtml+xml" content type
   - Even though I assigned dba as content owner, is is assigning dav as
   content owner
   - After the import que is run, two files are created and stored in
   DAV/home/dba/rdf_sink even though I select the option to store a single
   file: (verbs and urn_dav_home_dba_rdf_sink.RDF). If I access the verbs file
   in webdav it renders the html that was imported. If I click
   the urn_dav_home_dba_rdf_sink.RDF it is not available. Note: the verbs file
   is being stored as "application/xhtml+xml" content type and
   the urn_dav_home_dba_rdf_sink.RDF is being stored as text/xml in webdav.


After all of this I decided to check and see if I could load the HTML5/RDFa
document using isql-v:

SQL> DB.DBA.RDF_LOAD_RDFA (http_get('
http://xapi.vocab.pub/datasets/adl/verbs/'), '
http://xapi.vocab.pub/datasets/adl/verbs/#', '
http://xapi.vocab.pub/datasets/adl/verbs');

This worked and the graph and triples are in the 

[Virtuoso-users] Faceted Browser

2016-01-14 Thread Haag, Jason
Hi All,

Is there a way to customize the faceted browser featured views and content?

I followed the steps here:
http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtFCTFeatureQueries

I was able to create some featured views, but would like to remove some of
the old ones. I tried this by changing the value to "0" for the featured
flag.

update fct_stored_qry set fsq_featured=0 where fsq_id=0;

But the old featured view is still showing up. Is there a way to clear the
faceted data and start over? Also, where would one update the other text of
this page for adding other instructions other than featured facets and
sparql queries?

Thanks,

J Haag
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] Not Giving up on HTML5/RDFa Import

2016-01-14 Thread Kingsley Idehen
On 1/14/16 11:17 AM, Haag, Jason wrote:
> Hi All,
>
> I"m back again evaluating Virtuoso for the HTML5/RDFa crawling
> capability. We are considering moving to the Universal Server from VOS
> if I can ever prove to my team that it will be a good choice for
> sponging and crawling HTML5/RDFa files.

It most certainly is.

> I have been testing this feature periodically over the past several
> months with no luck. I appreciate the support and feedback so far, but
> I haven't made any progress. Some previous posts/inquiries I made on
> this topic are available
> here: http://sourceforge.net/p/virtuoso/mailman/message/34507072/ and
> here: 
> http://sourceforge.net/p/virtuoso/mailman/virtuoso-users/thread/CAHjqjnLo7-hiA30neYBsbGm93HeXe%3DHrda5rZPGS%3Dwm%2B08ZvBw%40mail.gmail.com/#msg34525370
>
> I would really like to use the conductor interface to regularly
> schedule the import several graph IRIs that contain RDFa and check the
> triples for any additions on daily basis. I recently upgraded the
> installation to VOS 7.2.3 and still can't see to get the RDFa data to
> populate the data store.

Why don't you approach this matter as follows:

[1] Use the live instance at http://linkeddata.uriburner.com to import
your target data sources
[2] Compare that with what's happening on your local instance.

> After I run the import from the que, anytime I query the virtuoso
> database there is no data from my RDFa datasets that I have imported
> through conductor. I must be doing something wrong or missing an
> important step somewhere. However, if I use these same exact RDFa IRIs
> using the isql-v function (DB.DBA.RDF_LOAD_RDFA) the triples load
> successfully.

Yes, so there is something amiss in your setup. You import/crawl jobs
should include directives for invoking the sponger cartridge for HTML docs.
>
> Here's a summary of what I've done and discovered so far:
>
> 1) Installed VOS 7.2.3 successfully
> 2) Read some of the newly updated documentation, which is excellent by
> the way
> 3) Checked/updated sponger priveledges per this guidance for securing
> the
> endpoint: 
> http://docs.openlinksw.com/virtuoso/rdfsparql.html#rdfsupportedprotocolendpointuri
> 4) Installed cartridges_dav.vad from commercial version (for sponger
> cartridges):
> http://opldownload.s3.amazonaws.com/uda/vad-packages/7.2/cartridges_dav.vad
> 5) Checked and configured xHTML /  aka HTML5 (and variants) cartridge
> under “extractor cartridges” with the following settings (per advice
> from the mailing list/forums): 
>
> Pattern: (application/xhtml.xml)|(text|application)/.*(html|xml)
> fallback-mode=no
> rdfa=yes
> reify_html5md=1
> reify_rdfa=0

For now (while you are troubleshooting), also use: reify_rdfa=1
> reify_jsonld=0
> reify_all_grddl=0
> passthrough_mode=yes
> loose=yes
> reify_html=0
> reify_html_misc=0
> reify_turtle=no
>
>
> I also tried this basic configuration as well:
> add-html-meta=no
> get-feeds=no
> rdfa=yes
> fallback-mode=no
> reify_html=no
> reify_html_misc=no
> reify_html5md=no
> reify_rdfa=no
> reify_jsonld=no
> reify_turtle=no
> reify_all_grddl=no
> passthrough_mode=no
> loose=no
>
> 6) Created a content import for the HTML5/RDFa document using
> conductor with the following options:
>
> Target URL: http://xapi.vocab.pub/datasets/adl/verbs
> login owner: dba
> checked the following
>
>   * store documents locally
>   * run sponger
>   * store metadata (selected xHTML aka HTML5 and variants)
>

Goto: https://www.pinterest.com/pin/389561436498376210/  -- this shows
your content via the lenses of our OSDS browser extension
 .

>
> 7) Run the import, and 0/1 pages/sites were retrieved and looked up
> the error to be: "XM003: XML parser detected an error: ERROR : Tag
> nesting error: name 'head' of end tag does not match the name 'link'
> of start tag at line 19 column 108 at line 20 column 9 of source text
>  ---^"
> 8) This appears to be a validation error looking for closing tags of
> the  and  elements. It appears the content import isn't
> checking my doctype declaration.  HTML5 doesn't need to close the
>  or  elements whereas xhtml does. 
> 9) Updated the HTML5 to close the meta and link tags to work around
> this to see if the error would go away. It did!
> 10 Created a new import targeting the updated HTML5 with closing tags.
> This time,  no errors and one 1 site was retrieved successfully
> (http://xapi.vocab.pub/datasets/adl/verbs)
> 11) Check to see if the named graph and triples populated the
> database. Nothing there.
> SPARQL SELECT DISTINCT ?g WHERE {GRAPH ?g { ?s ?p ?o . }}
>
> Here are some strange things I noticed that could be causing issues.
> Not sure if anyone can explain what's happening here.
>
>   * Even though the content type is text/html and explicitly defined
> as such in the HTML metatag, the file is being stored in webdav as
> the "application/xhtml+xml" content type
>

That's fine. Virtuoso is doing that .

>   * Even though I assigned dba as 

Re: [Virtuoso-users] Not Giving up on HTML5/RDFa Import

2016-01-14 Thread Haag, Jason
Thanks for the response Kingsley. I will try some of these other tips you
provided.  I have a few questions for clarification:

1) When you say to use the live instance of URI burner, do you mean that
the import URI would be
http://linkeddata.uriburner.com/about/html/http/xapi.vocab.pub/datasets/adl/verbs
for the content import?
2) How would I compare URI burner results with my virtuoso instance when
the nothing is being imported using the conductor interface? There is
nothing to compare.
3) Is there any chance you could try importing this URI on your instance or
a test instance to see if you get the same results? If not, perhaps it is
unique to my setup.
4) I haven't tried the DET folder via webdav before. Is this the latest
documentation on that approach?
http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtTipsAndTricksControlDefineGraphWithSpongeOption

Thank you.

On Thu, Jan 14, 2016 at 12:46 PM, Kingsley Idehen 
wrote:

> On 1/14/16 11:17 AM, Haag, Jason wrote:
>
> Hi All,
>
> I"m back again evaluating Virtuoso for the HTML5/RDFa crawling capability.
> We are considering moving to the Universal Server from VOS if I can ever
> prove to my team that it will be a good choice for sponging and crawling
> HTML5/RDFa files.
>
>
> It most certainly is.
>
> I have been testing this feature periodically over the past several months
> with no luck. I appreciate the support and feedback so far, but I haven't
> made any progress. Some previous posts/inquiries I made on this topic are
> available here:
> http://sourceforge.net/p/virtuoso/mailman/message/34507072/ and here:
> http://sourceforge.net/p/virtuoso/mailman/virtuoso-users/thread/CAHjqjnLo7-hiA30neYBsbGm93HeXe%3DHrda5rZPGS%3Dwm%2B08ZvBw%40mail.gmail.com/#msg34525370
>
> I would really like to use the conductor interface to regularly schedule
> the import several graph IRIs that contain RDFa and check the triples for
> any additions on daily basis. I recently upgraded the installation to VOS
> 7.2.3 and still can't see to get the RDFa data to populate the data store.
>
>
> Why don't you approach this matter as follows:
>
> [1] Use the live instance at http://linkeddata.uriburner.com to import
> your target data sources
> [2] Compare that with what's happening on your local instance.
>
> After I run the import from the que, anytime I query the virtuoso database
> there is no data from my RDFa datasets that I have imported through
> conductor. I must be doing something wrong or missing an important step
> somewhere. However, if I use these same exact RDFa IRIs using the isql-v
> function (DB.DBA.RDF_LOAD_RDFA) the triples load successfully.
>
>
> Yes, so there is something amiss in your setup. You import/crawl jobs
> should include directives for invoking the sponger cartridge for HTML docs.
>
>
> Here's a summary of what I've done and discovered so far:
>
> 1) Installed VOS 7.2.3 successfully
> 2) Read some of the newly updated documentation, which is excellent by the
> way
> 3) Checked/updated sponger priveledges per this guidance for securing the
> endpoint:
> 
> http://docs.openlinksw.com/virtuoso/rdfsparql.html#rdfsupportedprotocolendpointuri
> 4) Installed cartridges_dav.vad from commercial version (for sponger
> cartridges):
> 
> http://opldownload.s3.amazonaws.com/uda/vad-packages/7.2/cartridges_dav.vad
> 5) Checked and configured xHTML /  aka HTML5 (and variants) cartridge
> under “extractor cartridges” with the following settings (per advice from
> the mailing list/forums):
>
> Pattern: (application/xhtml.xml)|(text|application)/.*(html|xml)
> fallback-mode=no
> rdfa=yes
> reify_html5md=1
> reify_rdfa=0
>
>
> For now (while you are troubleshooting), also use: reify_rdfa=1
>
> reify_jsonld=0
> reify_all_grddl=0
> passthrough_mode=yes
> loose=yes
> reify_html=0
> reify_html_misc=0
> reify_turtle=no
>
>
> I also tried this basic configuration as well:
> add-html-meta=no
> get-feeds=no
> rdfa=yes
> fallback-mode=no
> reify_html=no
> reify_html_misc=no
> reify_html5md=no
> reify_rdfa=no
> reify_jsonld=no
> reify_turtle=no
> reify_all_grddl=no
> passthrough_mode=no
> loose=no
>
> 6) Created a content import for the HTML5/RDFa document using conductor
> with the following options:
>
> Target URL: http://xapi.vocab.pub/datasets/adl/verbs
> login owner: dba
> checked the following
>
>- store documents locally
>- run sponger
>- store metadata (selected xHTML aka HTML5 and variants)
>
>
> Goto: https://www.pinterest.com/pin/389561436498376210/  -- this shows
> your content via the lenses of our OSDS browser extension
>  .
>
>
> 7) Run the import, and 0/1 pages/sites were retrieved and looked up the
> error to be: "XM003: XML parser detected an error: ERROR : Tag nesting
> error: name 'head' of end tag does not match the name