Sparqlines: SPARQL to Sparkline

2016-07-25 Thread Sarven Capadisli

http://csarven.ca/sparqlines-sparql-to-sparkline

Sparqlines are statistical observations fetched from SPARQL endpoints 
and displayed as sparklines (inline-charts) embedded in context. It 
describes an implementation which is part of dokieli a Web based 
authoring tool: https://github.com/linkeddata/dokieli


Open feedback most welcome!

-Sarven
http://csarven.ca/#i



Re: Mediatypes that map to application/ld+json with a profile

2016-07-16 Thread Sarven Capadisli

On 2016-07-16 00:21, Ruben Verborgh wrote:

Hi Sarven, Phil, Rob,


Are there mediatypes that map to application/ld+json with a profile?


This begs the question why one would want to do that.


We're investigating to see what else is other than AS2 is making/stating 
such equivalences:


application/activity+json and application/ld+json; 
profile="http://www.w3.org/ns/activitystreams;


To the best of my know knowledge, there aren't any.

While I wholeheartedly appreciate the discussion on whether that's a 
good or bad idea, that wasn't at all my intention. [Aside: you are 
welcome to dig into past Social Web WG's discussions/meetings to see my 
strong position against the new mediatype.]


The investigation was towards Linked Data Notifications:

https://linkedresearch.org/ldn/

to see what it can mention in order to apply the robustness principle. 
It is a step towards helping people design flexible systems that wish to 
so they can be a bit more interoperable with other 
communities/implementations. That's being pragmatic.


LDN is not inventing or proposing a new mediatype, it doesn't have a 
need for it, nor there are any plans for it, neither promoting "non-LD 
stuff". Lastly, LDN has already marked this as at risk and prepared to 
hand it off to the Social Web Protocols. 
https://github.com/csarven/ldn/issues/10 already addressed all this, 
satisfied all commenters, and marked as closed:


* https://linkedresearch.org/ldn/#sending-activitystreams2-support
* https://linkedresearch.org/ldn/#consuming-activitystreams2-support

-Sarven
http://csarven.ca/#i



Re: Mediatypes that map to application/ld+json with a profile

2016-07-15 Thread Sarven Capadisli

On 2016-07-15 09:44, Ghislain Atemezing wrote:

Are there mediatypes that map to application/ld+json with a profile?
Only aware of application/activity+json


Not happy with this content here https://www.w3.org/ns/formats/ ?
Specially this line https://www.w3.org/ns/formats/data/JSON-LD ?


Thanks Ghislain.

I meant the equivalences that can be made between JSON conventions and 
JSON-LD.


Is there anything out there that's treating a JSON mediatype with 
application/ld+json + profile?


e.g., application/ld+json; 
profile="http://www.w3.org/ns/activitystreams; and 
application/activity+json is one of those as far as I know.


Also raised this in GeoJSON-LD since GeoJSON uses application/geo+json:
https://github.com/geojson/geojson-ld/issues/41

If any, what else is out there?

-Sarven
http://csarven.ca/#i



Mediatypes that map to application/ld+json with a profile

2016-07-15 Thread Sarven Capadisli

Dear LazySW,

Are there mediatypes that map to application/ld+json with a profile? 
Only aware of application/activity+json


-Sarven
http://csarven.ca/#i



Linked Data Notifications

2016-07-11 Thread Sarven Capadisli

Hi all,

We are working on a protocol called Linked Data Notifications (LDN) to 
facilitate exchanging messages between applications (senders, receivers, 
and consumers):


https://linkedresearch.org/ldn/

It is not dependant on LDP but compatible with existing LDP server 
implementations.


For an example of one way this can be used, see an annotation on the 
spec itself. This annotation stored in the annotation author's own 
storage; when it was created, a notification was sent to the document's 
inbox according to LDN, and this is also retrieved and used to display it.


LDN is an Editor's Draft at the W3C Social Web Working Group. Folks are 
invited to get involved there at the WG or through the Github (issues):


https://github.com/csarven/ldn/issues/

There is a Gitter chat: https://gitter.im/csarven/ldn and we (csarven 
and rhiaro) are also in #swig on Freenode and #social on irc.w3.org and 
happy to discuss/feedback there too.


-Sarven
http://csarven.ca/#i



Re: CFP: First International Workshop on Reproducible Open Science (RepScience 2016)

2016-05-10 Thread Sarven Capadisli

On 2016-05-10 08:39, Herbert Van de Sompel wrote:

Sarven,

I am a fan of your linked research work. But I think it's a bit unjust
to characterize D-Lib Magazine as fitting in the category "via paper and
desktop/print centric tools and formats."



D-Lib is, and has since its start in 1995, been an HTML-only journal
that has served the Digital Library community very well. Just recently,
I published a paper [1] in D-Lib in which the editors agreed  to allow
me to diverge from their template in order to demonstrate the Robust
Links [2] approach to combat reference rot in scholarly communication.


Thank you Herbert.

I'm aware of D-Lib, and it is fantastic that they gave room to exemplify 
your work to the greatest extent possible.


I was merely pointing at the workshop in particular because that's the 
primary point of engagement with the community. Is it encouraging the 
methods to share, reuse, reproduce that it stands behind? Oscar's second 
email certainly comes across that way (and that's a lot more reassuring 
then the first - at least to me).


There is much more to be said about encouraging and enabling the 
community (which was discussed a number of times in these mailing lists 
which I'm sure you well know). The point that tends to circle back 
around is that, if you ask a researcher to submit in X, they will most 
certainly submit in X. They will also pass that knowledge (the whole 
process) to their colleagues. So, if we for instance ask researchers 
coming into the field to embrace Webby submissions, we should be able to 
phase out desktop/print mentality especially in Web Science.


None of this is to suggest that people should be using tools that they 
don't want or can - needless to say, we need to be considerate about 
accessibility - but rather taking measures to have some interoperability 
between the research output, instead of sending it out to a black hole. 
It is neither to suggest that print is bad. The fundamental difference 
here is that, some of the formats and mediums that we ask the community 
to expose their work on the Web (of all places) tend to be severely 
limited right from the start. I think we can do better.


To take this workshop as an example, its submission requirements is no 
different than the calls from events that work with the "publishers" 
that are practically indifferent about any of this as long as it reduces 
their costs and maximises profits on all fronts. My point: the fact that 
D-Lib embraces the Web/HTML and friends is entirely hidden in this call. 
What remains is the expertise that (new) researchers compile during the 
process of submitting to this work - which tends to encourage the opposite.


Again, I'm merely suggesting that the voice of the community adapts to 
the state of the art. Technology is not the core problem. We always have 
social problems :)


Aside: it took "Linked Science" *4 years* to come around to this point.

https://lists.w3.org/Archives/Public/public-lod/2013Apr/0291.html
https://twitter.com/LinkedScience/status/729978893160026113

What changed? Absolutely nothing on the technology end since everything 
was there right from the beginning - I've even demonstrated that at the 
time just to make the obvious point (via which is now known as 
https://dokie.li/ ). As far as I can see, the essential change appears 
to be on the social end.



We had tried to achieve the same with a paper about reference rot in
PLOS ONE [3] but our request was declined.


I was introduced to it by Shawn Jones at WWW2016: Persistent URIs Must 
Be Used To Be Persistent.



While I agree that D-Lib does not represent an incarnation of your
intended paradigm shift, I really don't think they are the enemy either.


Pardon me but I had no intention or need to mark anyone as an an enemy 
:) Focus is to encourage/enable researchers, organizers, institutions to 
shift while trying to keep it within reach by pinging the folks in Web 
Science, not all sciences.


This is especially why "Linked Research" is a proposed initiative to 
move towards. It is all open for discussion, and there are number of 
ways to engage. https://linkedresearch.org/ . Never asked or demanded 
must haves on the technologies outside of what's "Webby". Not "selling" 
a tool here. :)



BTW: Maybe you could consider supporting Robust Links in your work. It's
all about long-term access and integrity of the web-based scholarly
record and hence should be of interest to you.


Thanks for bring this up. I think we already cover those use cases in 
dokieli, but added 
https://github.com/linkeddata/dokieli/issues/41#issuecomment-218147564 
to keep it in the radar in any case. I will take a closer look.


-Sarven
http://csarven.ca/#i


Cheers

Herbert

[1] Van de Sompel, H., and Nelson, M.L. (2015) Reminiscing About 15
Years of Interoperability Efforts. D-Lib Magazine, 21(11/12).
DOI:10.1045/november2015-vandesompel,
http://dx.doi.org/10.1045/november2015-vandesompel

[2] Robust Links spec. 

Re: CFP: First International Workshop on Reproducible Open Science (RepScience 2016)

2016-05-10 Thread Sarven Capadisli

On 2016-05-10 06:51, Oscar Corcho wrote:

## Paper Submission ##

Authors are invited to submit original, unpublished research papers.
Submitted manuscripts will have to be in the range of 4000-5000 words and
edited with OpenOffice Writer or Microsoft Word, following the "Matters of
style" section in the author guidelines for D-Lib Magazine.

Papers submitted to the workshop will undergo a single-blind peer-review
process by Program Committee members. Accepted papers will be published as
a special issue of the D-Lib Magazine journal, in the first Quarter of
2017. To be published on the proceedings, accepted contributions should be
revised according to the reviews and consider the feedback from the
workshop. Moreover, at least one author is required to register and
present the paper at the workshop.


Why is this workshop encouraging "reproducible" "open science" via paper 
and desktop/print centric tools and formats?


Is the intention to "reproduce" still based on classical methods? For 
example, how do you propose that the accepted works of this workshop are 
reproduced?


What do you think about taking the initiative towards this "paradigm shift":

http://csarven.ca/linked-research-scholarly-communication

If that is of interest, what do you think it would require for this 
workshop to embrace that?


-Sarven
http://csarven.ca/#i



Re: CFP: Linked Data for Information Extraction LD4IE2016 - workshop at @ISWC2016

2016-05-02 Thread Sarven Capadisli

On 2016-05-02 08:02, Anna Lisa Gentile wrote:

We would like to encourage you to submit your paper as HTML, in which
case you need to submit a zip archive containing an HTML file and all
used resources.


:) Thank you for this!


If you are new to HTML submission these are good places to start:
- Linked Research: Example paper using LNCS layout is at
http://linked-research.270a.info/


Just an FYI:

https://linkedresearch.org/ is an initiative. We would like researchers 
to embrace and build towards this "paradigm shift". Join the chat: 
https://gitter.im/linkedresearch/chat


https://github.com/linkeddata/dokieli is a clientside editor for 
decentralised article publishing, annotations and social interactions. 
More about that here: http://csarven.ca/dokieli and https://dokie.li/


-Sarven
http://csarven.ca/#i



SemStats 2016 Call for Contributions

2016-04-29 Thread Sarven Capadisli

# SemStats 2016 Call for Contributions


## Document ID
http://semstats.org/2016/call-for-contributions


## Keywords
ISWC2016, SemStats, Linked Data, SDMX, Statistics, Statistical database, 
Data integration



## Event
4th International Workshop on Semantic Statistics co-located with 15th 
International Semantic Web Conference (ISWC 2016)



## Location
Kobe, Japan


## Date
October 17, 2016 or October 18, 2016


## Important dates
* Submission deadline: July 7th, 2016, 23:59PM Hawaii time
* Notifications to authors: July 31st 2016, 23:59PM Hawaii time


## Workshop Summary
The goal of this workshop is to explore and strengthen the relationship 
between the Semantic Web and statistical communities, to provide better 
access to the data held by statistical offices. It will focus on ways in 
which statisticians can use Semantic Web technologies and standards in 
order to formalize, publish, document and link their data and metadata, 
and also on how statistical methods can be applied on linked data. It is 
the fourth workshop in a series that started at the International 
Semantic Web Conference in 2013 (SemStats 2013) and run since every year 
at ISWC (SemStats 2014 and SemStats 2015).



## Topics
The workshop will address topics related to statistics and linked data. 
This includes but is not limited to:


### How to publish linked statistics?
* What are the relevant vocabularies for the publication of statistical 
data?
* What are the relevant vocabularies for the publication of statistical 
metadata (code lists and classifications, descriptive metadata, 
provenance and quality information, etc.)?
* What are the existing tools? Can the usual statistical software 
packages (e.g. R, SAS, Stata) do the job?
* How do we include linked data production and publication in the data 
lifecycle?

* How do we establish, document and share best practices?

### How to use linked data for statistics?
* Where and how can we find statistics data: data catalogues, dataset 
descriptions, data discovery?
* How do we assess data quality (collection methodology, traceability, 
etc.)?
* How can we perform data reconciliation, ontology matching and instance 
matching with statistical data?
* How can we apply statistical processes on linked data: data analysis, 
descriptive statistics, estimation, correction?
* How to intuitively represent statistical linked data: visual 
analytics, results of data mining?



## Submissions
This workshop is aimed at an interdisciplinary audience of researchers 
and practitioners involved or interested in Statistics and the Semantic 
Web. All contributions must represent original and unpublished work that 
is not currently under review. Contributions will be evaluated according 
to their significance, originality, technical content, style, clarity, 
and relevance to the workshop.


The workshop will welcome the following types of submissions:
* Full and short papers (up to 12 and 6 pages)
* Challenge papers (up to 12 pages)
* Demo papers (up to 6 pages)


## Awards
This year, SemStats will award prizes, thanks to the generous sponsoring 
of the CASD:

* The best contribution to the workshop will win €1000.
* The best challenge paper will win €500

Please visit http://semstats.org/2016/call-for-contributions for more 
information. If you are interested in submitting a contribution but 
would like more preliminary information, please contact 
semstats2...@easychair.org.




-Sarven
http://csarven.ca/#i




Deprecating owl:sameAs

2016-04-01 Thread Sarven Capadisli
There is overwhelming research [1, 2, 3] and I think it is evident at 
this point that owl:sameAs is used inarticulately in the LOD cloud.


The research that I've done makes me conclude that we need to do a 
massive sweep of the LOD cloud and adopt owl:sameSameButDifferent.


I think the terminology is human-friendly enough that there will be 
minimal confusion down the line, but for the the pedants among us, we 
can define it along the lines of:



The built-in OWL property owl:sameSameButDifferent links things to 
things. Such an owl:sameSameButDifferent statement indicates that two 
URI references actually refer to the same thing but may be different 
under some circumstances.



Thoughts?

[1] https://www.w3.org/2009/12/rdf-ws/papers/ws21
[2] http://www.bbc.co.uk/ontologies/coreconcepts#terms_sameAs
[3] http://schema.org/sameAs

-Sarven
http://csarven.ca/#i



Re: Survey: Use of this list for Calls for Papers

2016-03-30 Thread Sarven Capadisli

On 2016-03-30 13:21, Phil Archer wrote:

Dear all,

A perennial topic at W3C is whether we should allow calls for papers to
be posted to our mailing lists. Many argue, passionately, that we should
not allow any CfPs on any lists. It is now likely that this will be the
policy, with any message detected as being a CfP marked as spam (and
therefore blocked).

Historically, the semantic-web and public-lod lists have been used for
CfPs and we are happy for this to continue *iff* you want it.

Last time we asked, the consensus was that CfPs were seen as useful, but
it's time to ask you again.

Please take a minute to answer the 4 question, no need for free text,
survey at https://www.w3.org/2002/09/wbs/1/1/

Thanks

Phil.



Excellent! Completed survey.

So, if this goes through, will there be public-lod-CfP-PDF (as per my 
suggestion)? :P


-Sarven
http://csarven.ca/#i



Re: CfP: ESWC2016 Workshop PROFILES'16 on Dataset Profiling and Federated Search for Linked Data

2016-02-22 Thread Sarven Capadisli

On 2016-02-22 11:53, Stefan Dietze wrote:

We welcome the following types of contributions.

 Short (up to 6 pages) and full (up to 15 pages) research papers
 Poster abstracts and system demonstrations should not exceed 4 pages

All submissions must be written in English and must be formatted
according to the Springer LNCS proceedings style.
Each submission will be reviewed by at least 3 members of the PC. Papers
will be evaluated according to their significance,
originality, technical content, style, clarity, and relevance to the
workshop. Please submit your contributions electronically
in PDF format via the Easychair system:
https://www.easychair.org/conferences/?conf=profiles2016.


Lets say that we want to profile research articles in Web Science or the 
past articles from the *Workshop on PROFILES on Dataset Profiling and 
Federated  Search for Linked Data*. Do you suggest that we first RDFize 
the PDF articles, and then publish the data following the LD design 
principles, and then get down to profiling?


What's the recommended way for the community to extract the knowledge 
out of this workshop in the future?


-Sarven
http://csarven.ca/#i



Re: ESWC 2016 - Call for Challenge: Semantic Publishing

2016-02-18 Thread Sarven Capadisli

On 2016-02-18 11:48, Angelo Di Iorio wrote:

Linked Research (https://github.com/csarven/linked-research) are also accepted 
as long as the final camera-ready version conforms to Springer's requirements.


https://github.com/csarven/linked-research is an initiative, not an 
authoring tool.


https://github.com/linkeddata/dokieli is the tooling you want to mention 
in your calls. You can read more about that here: 
http://csarven.ca/dokieli or here https://dokie.li/


-Sarven
http://csarven.ca/#i



Re: CfP: International Workshop on Completing and Debugging the Semantic Web (CODES 2016)

2016-01-27 Thread Sarven Capadisli

On 2016-01-25 09:50, Heiko Paulheim wrote:

CoDeS 2016

International Workshop on
Completing and Debugging the Semantic Web

May 29 or 30, 2016
Heraklion, Greece

http://www.ida.liu.se/~patla00/conferences/CoDeS16/

co-located with ESWC 2016 (http://2016.eswc-conferences.org/)





Submission Guidelines

Paper submission and reviewing for this workshop will be electronic via 
EasyChair. The papers should be written in English, follow Springer LNCS 
format, and be submitted in PDF.



There is a bug in the submission requirements. Do you accept PRs?

-Sarven
http://csarven.ca/#i



Re: CFP 2nd International Workshop on Computational History and Data-Driven Humanities

2016-01-27 Thread Sarven Capadisli

On 2016-01-27 12:51, Christophe Debruyne wrote:

Scope:
This workshop focuses on the challenges and opportunities of data-driven
humanities and seeks to bring together world-leading scientists and
scholars at the forefront of this emerging field, at the interface
between computer science, social science, humanities and mathematics.

As historical knowledge becomes increasingly available in forms that
computers can process, this data becomes amenable to large-scale
computational analysis and interpretation. what are the impacts for
humanities, social sciences, computer science and complex systems?
Perhaps mathematical analysis of the dynamic, evolutionary patterns
observed in the data helps us to better understand the past and can even
produce empirically-grounded predictions about the future.

We seek
* computer scientists and digital humanities experts to introduce
technologies and tools they have applied in order to extract knowledge
from historical records in a form that can be processed by computers
without losing its meaningfulness.
* scientists working at the forefront of mathematical and theoretical
analysis of historical data, to describe what is possible with current
tools.


Sounds great!


*Submission guidelines
Submission URL is: https://easychair.org/conferences/?conf=chdh2016

Submitted papers must be original, unpublished, and not submitted to
another conference or journal for consideration. Accepted papers will be
presented at the conference. All submitted papers will be evaluated
based on originality, significance, technical soundness, and clarity of
expression. All papers will be refereed by 3 members of the PC. All
submissions must be in English. We solicit short papers describing (i)
new ideas (5-6 pages) and (ii) longer papers presenting more
tangible results (max. 10 pages). At least one author of each accepted
paper must register by the early date indicated on the conference
website and present the paper. Authors must follow the Springer LNCS
formatting instructions: http://www.springer.com/computer/lncs.

*Highlights:
All accepted papers will be published by Springer and made available
through IFIP Digital Library, one of the world's largest scientific
libraries. Proceedings will be submitted for indexing by Google Scholar,
ISI, EICompendex, Scopus and many more. Accepted papers after
presentation and extension may be invited to be published in a special
issue of Cliodynamics: The Journal of Quantitative History and Cultural
Evolution (e-ISSN: 2373-7530) and indexed by Scopus.


It sounds like this workshop is encouraging and endorsing the following:

* Store scholarly articles essentially in PDF
* Hand the knowledge over to a 3rd party company

Is that an accurate summary?

-Sarven
http://csarven.ca/#i



Re: SEMANTiCS 2016, Leipzig, Sep 12-15, Call for Research & Innovation Papers

2016-01-19 Thread Sarven Capadisli

On 2016-01-18 04:56, Sebastian Hellmann wrote:

Papers must be submitted in PDF (Adobe's Portable Document Format)
format. Other formats will not be accepted. For the camera-ready
version, the source files (Latex, WordPerfect, Word) will also be needed.


Sigh. So much for "SEMANTiCS".

-Sarven
http://csarven.ca/#i



Re: Please publish Turtle or JSON-LD instead of RDF/XML [was Re: Recommendation for transformation of RDF/XML to JSON-LD in a web browser?]

2015-09-03 Thread Sarven Capadisli

On 2015-09-03 19:03, David Booth wrote:

I encourage all RDF publishers to use one of the other standard RDF
formats such as Turtle or JSON-LD.  All commonly used RDF tools now
support Turtle, and many or most already support JSON-LD.


I have grown to (or to be brutally honest; tried very hard to) remain 
agnostic about the RDF formats. This is simply because given sufficient 
context, it is trivial to point out which format is preferable for both 
publication and expected consumption.


The decision to pick one or more formats over the other can easily boil 
down to understanding how and what will be handling the formats in the 
whole data pipeline.


It is great to see newcomers learn N-Triples/Turtle, because it is as 
human-friendly as it gets (at this time) to read and write statements. 
That experience is also an excellent investment towards SPARQL. Having 
said that, we are not yet at a state to publish semantically meaningful 
HTML documents by authoring Turtle. There is the expectation that some 
other out of band code or application needs to wrap it all up.


By the same token, JSON-LD is excellent for building applications by 
imperative means, however, it is stuck in a world where it is dependent 
on languages to manipulate and make use of the data. To generate it, it 
depends on something else as well. tree structure just like RDF/XML here. GOTO 10>.


At the end of the day, however the data is pulled or pushed, it needs to 
end up on some user-interface. That UI is arguably and predominantly an 
HTML document out there. Hence my argument is that, all roads lead to HTML.


As I see it, RDFa gets the most mileage above all other formats for 
prose content, and a fair amount of re-use. It ends up on a webpage that 
is intended for humans, meanwhile remaining machine-friendly. A single 
code base (which is mostly declarative), a single GET, a single URL 
representation to achieve all of that.


I still remain agnostic on this matter, because there is no one size 
fits all. After all, in the command-line, N-Triples still has the last word.


So, as long as one speaks the RDF language, the rest tends to be 
something that the machines should be doing on behalf of humans any way, 
and that ought to remain as the primary focus. That is, speak RDF, keep 
improving the UI for it.


All formats bound to age - with the exception of HTML of course, because 
it still rocks and has yet to fail! ;)


-Sarven
http://csarven.ca/#i



Re: Discovering a query endpoint associated with a given Linked Data resource

2015-08-26 Thread Sarven Capadisli

On 2015-08-26 10:45, Nandana Mihindukulasooriya wrote:

Hi,

Is there a standard or widely used way of discovering a query endpoint
(SPARQL/LDF) associated with a given Linked Data resource?

I know that a client can use the follow your nose and related link
traversal approaches such as [1], but if I wonder if it is possible to
have a hybrid approach in which the dereferenceable Linked Data
resources that optionally advertise query endpoint(s) in a standard way
so that the clients can perform queries on related data.

To clarify the use case a bit, when a client dereferences a resource URI
it gets a set of triples (an RDF graph) [2].  In some cases, it might be
possible that the returned graph could be a subgraph of a named graph /
default graph of an RDF dataset. The client wants to discover if a query
endpoint that exposes the relevant dataset, if one is available.

For example, something like the following using the search link
relation [3].

--
HEAD /resource/Sri_Lanka
Host: http://dbpedia.org
--
200 OK
Link: http://dbpedia.org/sparql; rel=search; type=sparql,
http://fragments.dbpedia.org/2014/en#dataset; rel=search; type=ldf
... other headers ...
--

Best Regards,
Nandana

[1]
http://swsa.semanticweb.org/sites/g/files/g524521/f/201507/DissertationOlafHartig_0.pdf
[2] http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#section-rdf-graph
[3] http://www.iana.org/assignments/link-relations/link-relations.xhtml


Sort of. See void:sparqlEndpoint and /.well-known/void

-Sarven
http://csarven.ca/#i



Re: Discovering a query endpoint associated with a given Linked Data resource

2015-08-26 Thread Sarven Capadisli

On 2015-08-26 10:56, Sarven Capadisli wrote:

On 2015-08-26 10:45, Nandana Mihindukulasooriya wrote:

Hi,

Is there a standard or widely used way of discovering a query endpoint
(SPARQL/LDF) associated with a given Linked Data resource?

I know that a client can use the follow your nose and related link
traversal approaches such as [1], but if I wonder if it is possible to
have a hybrid approach in which the dereferenceable Linked Data
resources that optionally advertise query endpoint(s) in a standard way
so that the clients can perform queries on related data.

To clarify the use case a bit, when a client dereferences a resource URI
it gets a set of triples (an RDF graph) [2].  In some cases, it might be
possible that the returned graph could be a subgraph of a named graph /
default graph of an RDF dataset. The client wants to discover if a query
endpoint that exposes the relevant dataset, if one is available.

For example, something like the following using the search link
relation [3].

--
HEAD /resource/Sri_Lanka
Host: http://dbpedia.org
--
200 OK
Link: http://dbpedia.org/sparql; rel=search; type=sparql,
http://fragments.dbpedia.org/2014/en#dataset; rel=search; type=ldf
... other headers ...
--

Best Regards,
Nandana

[1]
http://swsa.semanticweb.org/sites/g/files/g524521/f/201507/DissertationOlafHartig_0.pdf

[2]
http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#section-rdf-graph
[3] http://www.iana.org/assignments/link-relations/link-relations.xhtml


Sort of. See void:sparqlEndpoint and /.well-known/void


.. and sd:Service

-Sarven
http://csarven.ca/#i



SemStats 2015 Call for Challenge

2015-08-03 Thread Sarven Capadisli

SemStats 2015 Call for Challenge


Document ID
http://semstats.org/2015/call-for-challenge

Hashtags
#ISWC2015 #SemStats

Event
3rd International Workshop on Semantic Statistics co-located with 
14th International Semantic Web Conference (ISWC 2015)


Location
Bethlehem, U.S.

Date
October 11, 2015

Abstract


The SemStats Challenge is back with more action! It is organized in the 
context of the SemStats 2015 workshop. Participants are invited to apply 
statistical techniques and semantic web technologies.


The challenge will consist in the realization of mashups or 
visualizations, but also on comparisons, analytics, alignment and 
enrichment of the data and concepts involved in statistical data (see 
below for the data made available and additional requirements).


The deadline for participants to submit their challenge papers (up to 6 
pages, same submission guidelines as the Call for Papers 
http://semstats.org/2015/call-for-papers#submissions ) and application 
is Monday 21 September, 2015, 23:59pm Hawaii Time. Submission is done 
via EasyChair ( https://www.easychair.org/conferences/?conf=semstats2015 
) by selecting the Challenge paper category.


It is strongly suggested to all challenge participants to send contact 
informations to semstats2...@easychair.org in order to be kept informed 
in case of any changes in the data provided.


Open Data Track
===

This year we are going with the Open Track: papers must describe a 
publicly available application. We would love to see everyone play and 
learn from what you have created. You are welcome to use any statistical 
data whether it is already in Linked Data shape or not! While you are at 
it, why not combine it with data from other domains?


Here are some dataset suggestions for this year’s challenge:

Italian National Institute of Statistics
Istat makes available Italian Population and Housing Census 2011 
http://datiopen.istat.it/ Linked Data. See also the description of data 
and metadata and for an example dataset.


Scottish Government
The Scottish Statistics Beta http://statisticsbeta.com/ (soon to be 
http://statistics.gov.scot/) provides you with the data behind their 
official statistics on Neighbourhood Statistics.


UK Department for Communities and Local Government
UK DCLG provides their official Linked Open Data 
http://opendatacommunities.org/ of a selection of statistics [...] 
including Local Government finance, housing and homelessness, wellbeing, 
deprivation, and the department’s business plan as well as supporting 
geographical data.


Flemish Government
Statistics from the Flemish Government makes their statistical 
cubes 
http://data.opendataforum.info/organization/5294d629-c439-4e60-8ade-4da1f9068cc5?res_format=TTL 
available with SKOS  XKOS hierarchies.


And more:
http://270a.info/
http://eurostat.linked-statistics.org/
http://data.cso.ie/
http://linked-statistics.gr/
http://linkedspending.aksw.org/
http://cedar-project.nl/
http://datahub.io/ – Whatever is here ;)



-Sarven
http://csarven.ca/#i



[CFP] Third International Workshop on Semantic Statistics (SemStats 2015)

2015-07-03 Thread Sarven Capadisli
 challenges when trying to adopt 
Semantic Web technologies, in particular:


* difficulty to create and publish linked data: this can be alleviated 
by providing methods, tools, lessons learned and best practices, by 
publicizing successful examples and by providing support.
* difficulty to see the purpose of publishing linked data: we must 
develop end-user tools leveraging statistical linked data, provide 
convincing examples of real use in applications or mashups, so that the 
end-user value of statistical linked data and metadata appears more clearly.
* difficulty to use external linked data in their daily activity: it is 
important to develop statistical methods and tools especially tailored 
for linked data, so that statisticians can get accustomed to using them 
and get convinced of their specific utility.


To conclude, statisticians know how misleading it can be to exploit 
semantic connections without carefully considering and weighing 
information about the quality of these connections, the validity of 
inferences, etc. A challenge for them is to determine, to ensure and to 
inform consumers about the quality of semantic connections which may be 
used to support analysis in some circumstances but not others. The 
workshop will enable participants to discuss these very important issues.


Topics
==
The workshop will address topics related to statistics and linked data. 
This includes but is not limited to:


How to publish linked statistics?
* What are the relevant vocabularies for the publication of statistical 
data?
* What are the relevant vocabularies for the publication of statistical 
metadata (code lists and classifications, descriptive metadata, 
provenance and quality information, etc.)?
* What are the existing tools? Can the usual statistical software 
packages (e.g. R, SAS, Stata) do the job?
* How do we include linked data production and publication in the data 
lifecycle?

* How do we establish, document and share best practices?

How to use linked data for statistics?
* Where and how can we find statistics data: data catalogues, dataset 
descriptions, data discovery?
* How do we assess data quality (collection methodology, traceability, 
etc.)?
* How can we perform data reconciliation, ontology matching and instance 
matching with statistical data?
* How can we apply statistical processes on linked data: data analysis, 
descriptive statistics, estimation, correction?
* How to intuitively represent statistical linked data: visual 
analytics, results of data mining?


Submissions
===
This workshop is aimed at an interdisciplinary audience of researchers 
and practitioners involved or interested in Statistics and the Semantic 
Web. All papers must represent original and unpublished work that is not 
currently under review. Papers will be evaluated according to their 
significance, originality, technical content, style, clarity, and 
relevance to the workshop. At least one author of each accepted paper is 
expected to attend the workshop.


Workshop participation is available to ISWC 2015 attendants at an 
additional cost, see http://iswc2015.semanticweb.org/registration for 
details.


The workshop will also feature a challenge based on Census Data 
published on the web or provided by Statistical Institutes. It is 
expected that data from Australia, France and Italy will be available. 
The challenge will consist in the realization of mashups or 
visualizations, but also on comparisons, alignment and enrichment of the 
data and concepts involved.


We welcome the following types of contributions:

* Full research papers (up to 12 pages)
* Short papers (up to 6 pages)
* Challenge papers (up to 6 pages)

All submissions must be written in English and must be formatted 
according to the information for LNCS Authors (see 
http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0).  Please 
note that (X)HTML(+RDFa) submissions are also welcome as long as the 
layout complies with the LNCS style. Authors can for example use the 
template provided at https://github.com/csarven/linked-research. 
Submissions are NOT anonymous. Please submit your contributions 
electronically in PDF format at 
http://www.easychair.org/conferences/?conf=semstats2015 and before July 
15, 2015, 23:59 PM Hawaii Time. All accepted papers will be archived in 
an electronic proceedings published by CEUR-WS.org.


If you are interested in submitting a paper but would like more 
preliminary information, please contact semstats2...@easychair.org.


Chairs
==
* Sarven Capadisli, University of Bonn, Germany
* Franck Cotton, INSEE, France
* Armin Haller, ANU, Australia
* Evangelos Kalampokis, CERTH/ITI and University of Macedonia, Greece
* Monica Scannapieco, Istat, Italy
* Raphaël Troncy, EURECOM, France

Program Committee
=
* Stefano Abbruzzini
* Phil Archer
* Ghislain Atemezing
* Chris Beer
* Oscar Corcho
* Stefano De Francisci
* Miguel Expósito
* Dan Gillman
* Arofan Gregory
* Tudor Groza

Re: Vocabulary to describe software installation

2015-05-04 Thread Sarven Capadisli

On 2015-05-01 16:22, Jürgen Jakobitsch wrote:

hi,

i'm investigating possibilities to describe an arbitrary software
installation process
in rdf. currently i've found two candidates [1][2], but examples are
practically non-existent.
has anyone done this before, are there somewhere real examples?

any pointer greatly appreciated.

wkr j

[1] http://www.w3.org/2005/Incubator/ssn/ssnx/ssn#Module_Deployment
[2]
http://wiki.dublincore.org/index.php/User_Guide/Publishing_Metadata#dcterms:instructionalMethod


You may have looked into this already, but I'll mention it any way, in 
case it is of use to someone else.


Consider using OPMW [3], P-PLAN [4] and PROV-O [5]. These aren't 
exclusively for software processes, but for any general process.


Depending on the granularity you want to work with, it might fit the 
bill. We have experimented with describing the actual workflows e.g., 
[6], but IIRC, have not executed an action from the descriptions.


[3] http://www.opmw.org/model/OPMW
[4] http://purl.org/net/p-plan
[5] http://www.w3.org/TR/prov-o/
[6] 
https://github.com/csarven/doingbusiness-linked-data/tree/dev/scripts 
(note: in development)


-Sarven
http://csarven.ca/#i



Re: Vocabulary to describe software installation

2015-05-04 Thread Sarven Capadisli

On 2015-05-04 01:38, Jürgen Jakobitsch wrote:

first experiment in action...

apache mesos 0.22.0 installation instruction :

sparql : http://goo.gl/6euq1I
turtle:
http://software.turnguard.com/apache_mesos/0/22/0/installation/opensuse/13/2.ttl


btw: do we have a datatype for bash/shell/python/whatever scripts (i.e.:
 shell:command ls^^xsd:shellcom)?

i'm somehow tempted to = run =  the sparql results...

wkr j

Fantastic!

Some (possibly trivial) feedback/considerations:

* Exclude sudo from the actions.
* Use absolute paths instead of relative.
* Consider integrating decisions on success/failure of steps, e.g., 
should step 11 run if step 10 is a failure?


re: what I've mentioned about OPMW/P-PLan/PROV-O in previous email, if 
the steps themselves focus only on describing what they are about, and 
exclude when they should be executed, they can be reused in other 
context without having to redefine the step. From that point, all that's 
necessary is actually coming up with a new workflow plan simply 
referring to steps.


-Sarven
http://csarven.ca/#i












[CFP] Third International Workshop on Semantic Statistics (SemStats 2015)

2015-04-29 Thread Sarven Capadisli
 methods, tools, lessons learned and best practices, by 
publicizing successful examples and by providing support.
* difficulty to see the purpose of publishing linked data: we must 
develop end-user tools leveraging statistical linked data, provide 
convincing examples of real use in applications or mashups, so that the 
end-user value of statistical linked data and metadata appears more clearly.
* difficulty to use external linked data in their daily activity: it is 
important to develop statistical methods and tools especially tailored 
for linked data, so that statisticians can get accustomed to using them 
and get convinced of their specific utility.


To conclude, statisticians know how misleading it can be to exploit 
semantic connections without carefully considering and weighing 
information about the quality of these connections, the validity of 
inferences, etc. A challenge for them is to determine, to ensure and to 
inform consumers about the quality of semantic connections which may be 
used to support analysis in some circumstances but not others. The 
workshop will enable participants to discuss these very important issues.


Topics
==
The workshop will address topics related to statistics and linked data. 
This includes but is not limited to:


How to publish linked statistics?
* What are the relevant vocabularies for the publication of statistical 
data?
* What are the relevant vocabularies for the publication of statistical 
metadata (code lists and classifications, descriptive metadata, 
provenance and quality information, etc.)?
* What are the existing tools? Can the usual statistical software 
packages (e.g. R, SAS, Stata) do the job?
* How do we include linked data production and publication in the data 
lifecycle?

* How do we establish, document and share best practices?

How to use linked data for statistics?
* Where and how can we find statistics data: data catalogues, dataset 
descriptions, data discovery?
* How do we assess data quality (collection methodology, traceability, 
etc.)?
* How can we perform data reconciliation, ontology matching and instance 
matching with statistical data?
* How can we apply statistical processes on linked data: data analysis, 
descriptive statistics, estimation, correction?
* How to intuitively represent statistical linked data: visual 
analytics, results of data mining?


Submissions
===
This workshop is aimed at an interdisciplinary audience of researchers 
and practitioners involved or interested in Statistics and the Semantic 
Web. All papers must represent original and unpublished work that is not 
currently under review. Papers will be evaluated according to their 
significance, originality, technical content, style, clarity, and 
relevance to the workshop. At least one author of each accepted paper is 
expected to attend the workshop.


Workshop participation is available to ISWC 2015 attendants at an 
additional cost, see http://iswc2015.semanticweb.org/registration for 
details.


The workshop will also feature a challenge based on Census Data 
published on the web or provided by Statistical Institutes. It is 
expected that data from Australia, France and Italy will be available. 
The challenge will consist in the realization of mashups or 
visualizations, but also on comparisons, alignment and enrichment of the 
data and concepts involved.


We welcome the following types of contributions:

* Full research papers (up to 12 pages)
* Short papers (up to 6 pages)
* Challenge papers (up to 6 pages)

All submissions must be written in English and must be formatted 
according to the information for LNCS Authors (see 
http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0).  Please 
note that (X)HTML(+RDFa) submissions are also welcome as long as the 
layout complies with the LNCS style. Authors can for example use the 
template provided at https://github.com/csarven/linked-research. 
Submissions are NOT anonymous. Please submit your contributions 
electronically in PDF format at 
http://www.easychair.org/conferences/?conf=semstats2015 and before July 
15, 2015, 23:59 PM Hawaii Time. All accepted papers will be archived in 
an electronic proceedings published by CEUR-WS.org.


If you are interested in submitting a paper but would like more 
preliminary information, please contact semstats2...@easychair.org.


Chairs
==
* Sarven Capadisli, University of Bonn, Germany, and Bern University of 
Applied Sciences, Switzerland

* Franck Cotton, INSEE, France
* Armin Haller, CSIRO, Australia
* Evangelos Kalampokis, CERTH/ITI and University of Macedonia, Greece
* Monica Scannapieco, Istat, Italy
* Raphaël Troncy, EURECOM, France

Program Committee
=
(To be confirmed)
* Stefano Abbruzzini
* Phil Archer
* Ghislain Atemezing
* Hadley Beeman
* Ric Clarke
* Oscar Corcho
* Richard Cyganiak
* Stefano De Francisci
* Jay Devlin
* Miguel Expósito
* Dan Gillman
* Alberto González Yanes
* Arofan Gregory
* Tudor Groza
* Christophe

Re: Research Track at 14th International Semantic Web Conference (ISWC2015). Abstract deadline in 2 weeks!

2015-04-09 Thread Sarven Capadisli

On 2015-04-09 06:53, Juan Sequeda wrote:

All research submissions must be in English, and no longer than 16 pages.
Papers that exceed this limit will be rejected without review. Submissions
must be in PDF formatted in the style of the Springer Publications format
for Lecture Notes in Computer Science (LNCS). For details on the LNCS
style, see Springer’s Author Instructions. ISWC-2015 submissions are not
anonymous.

We also encourage authors to include pointers to any additional material
that supports the scientific claims made in their papers (e.g., extended
technical reports, source code, datasets, links to applications).


Interesting to note that this call finds a need to make a statement that 
anything which goes in the direction of reproducibility is encouraged, 
as opposed to being mandatory and taken as a given in the *spirit of 
science*.


Essentially what *really* qualifies as scientific work (or close enough) 
for ISWC research articles is that it needs to *first and foremost* fit 
into the boundaries of a rectangular box (e.g., an A4 paper), and when 
printed, it is at stated maximum page length. Second, it needs to get an 
approval from a couple of anonymous reviewers.



Authors of accepted papers will be required to provide semantic annotations
for the abstract of their submission, which will be made available on the
conference web site. Details will be provided at the time of acceptance.


WONTFIX.

This is a dead-end exercise. A data-silo is a data-silo, no matter the 
make-up it puts on.


If the gleaned left-over metadata was of any real use from these papers, 
we would see something remarkably informative over the years about the 
research field by now, understand which problems are actually solved, 
which remain open; a gap in research, and distill the type of funding 
opportunities. The Semantic Web Dogfood can only do so much with what it 
is given. The real barriers are set beforehand at the time of 
communicating the research results. The communication is strictly 
visual, linear, non-interactive, and anti-social.



Accepted papers will be distributed to conference attendees and also
published by Springer in the printed conference proceedings, as part of the
Lecture Notes in Computer Science series.


There is not a hint of Web here.

-Sarven
http://csarven.ca/#i



Onwards, upwards, and any way but backwards

2015-04-01 Thread Sarven Capadisli
Are you disturbed by the fact that your printer is outputting your 
papers in the wrong order? Did the evil jokesters at your IT department 
disable page settings on your Windows ME?


At Cyberdyne Systems Linked Research, we have you covered. We are always 
with you in the printer room, where every second counts, where every 
tree doesn't count. If you love your kids' kids, and the future of 
humanity and research, you deserve the right to print right side up and 
upside down.


Technology, making better, better. Onwards, upwards, and any way but 
backwards. The future is happening today at Cyberdyne Systems Linked 
Research.


http://linked-research.270a.info/


-Sarven
http://csarven.ca/#i



Call for Linked Research

2015-03-24 Thread Sarven Capadisli

Call for Linked Research


Purpose: To encourage the “do it yourself” behaviour for sharing and 
reusing research knowledge.


Deadline: You decide.

From http://csarven.ca/call-for-linked-research :

Scientists and researchers who work in Web Science have to follow the 
rules that are set by the publisher; researchers need to have read and 
reuse access to other researchers work, and adopt archaic desktop-native 
publishing workflows. Publishers try to remain as the middleman for 
society’s knowledge acquisition.


Nowadays, there is more machine-friendly data and documentation made 
available by the public sector than the Linked Data research community. 
The general public asks for open and machine-friendly data, and they are 
following up. Web research publishing on the other hand, is stuck on one 
★ (star) Linked Data deployment scheme. The community has difficulty 
eating its own dogfood for research publication, and fails to deliver 
its share of the promise.


There is a social problem. Not a technical one. If you think that there 
is something fundamentally wrong with this picture, want to voice 
yourself, and willing to continue to contribute to the vision of the 
Web, then please consider the following before you write your research:


Linked Research: Do It Yourself

1. Publish your research and findings at a Web space that you control.

2. Publish your progress and work following the Linked Data design 
principles. Create a URI for everything that is of some value to you and 
may be to others e.g., hypothesis, workflow steps, variables, 
provenance, results etc.


3. Reuse and link to other researchers URIs of value, so nothing goes to 
waste or reinvented without good reason.


4. Create a strong user experience in the spirit of science: Use screen 
and print stylesheets. Create a copy of a view for the research 
community to fulfil organisational requirements. Design interactive 
user-interfaces for improved communication and education.


5. Announce your work publicly so that people and machines can discover it.

6. Have an open comment system policy for your document so that any 
person or machine can give feedback.


7. Help, encourage, and motivate others to do the same.

There is no central authority to judge the value of your contributions. 
You do not need permission to publish! Control your own research and 
communication.



-Sarven
http://csarven.ca/#i



Re: Research Track at 14th International Semantic Web Conference (ISWC2015). Abstract deadline in 1 month

2015-03-24 Thread Sarven Capadisli

On 2015-03-24 06:01, Juan Sequeda wrote:

14th International Semantic Web Conference (ISWC2015)
Bethlehem, Pennsylvania,
  USA October 11 - 15, 2015

Deadline: Abstract April 23, Full paper submission April 30


ISWC is the premier international forum for presenting research results on
the Intelligent Processing of Data on the Web. ISWC brings together
researchers from different areas of computer science, such as artificial
intelligence, databases, distributed systems and information retrieval who
aim at the development and use of novel technologies and techniques for
accessing, interpreting, processing and using information on the web in a
more effective way.

We solicit the submission of original research papers for ISWC 2015's
research track, dealing with analytical, theoretical, empirical, and
practical aspects of all these areas (including not only the ones that use
the RDF-based stack but also those that use other representations).
Submissions to the research track should describe original and significant
research in any of these areas.

To maintain the high level of quality and impact of the ISWC series, all
papers will be reviewed by at least three program committee members and one
senior program committee member. To assess papers, reviewers will judge
their originality and significance for further advances in any of these
areas, as well as the technical soundness of the proposed approaches and
the overall readability of the submitted papers. We also encourage authors
to include pointers to any additional material that supports the scientific
claims made in their papers (e.g., extended technical reports, source code,
datasets, links to applications).

Authors should also read carefully the calls for the other tracks, and
consider submitting to the most appropriate track. In the interest of a
coherent program the track chairs may suggest moving a submission to a
different track of the conference, with authors consent. However, multiple
submissions of the same paper to different tracks are not acceptable.

= Topics of Interest =
Topics of interest include, but are not limited to:

- Management of semantics and data on the Web, including Linked Data
- Languages, tools, and methodologies for representing and managing
semantics and data on the Web
- Database, Information Retrieval, Information Extraction, Natural Language
Processing and Artificial Intelligence techniques for the Semantic Web
- Searching and querying the Semantic Web
- Knowledge representation and reasoning on the Web
- Cleaning, quality assurance, and provenance of Semantic Web data,
services, and processes
- Semantic Web data analysis
- Ontology-based data access and integration/exchange on the Web
- Supporting multi-linguality in the Semantic Web
- User Interfaces and interaction with semantics and data on the Web
- Information visualization of Semantic Web data
- Personalized access to Semantic Web data and applications
- Geospatial semantics and data on the Web
- Cyber-Physical Social systems, data streams and the Internet of Things
- Semantic technologies for mobile platforms
- Ontology engineering and ontology patterns for the Web
- Ontology modularity, mapping, merging, and alignment for the Web
- Trust, privacy, and security on the Semantic Web
- Semantic Web and Linked Data for Cloud environments

= Review Criteria =
Papers in this track will be reviewed according to the following criteria:
- Novelty
- Relevance and impact of the research contributions to the area
- Soundness
- Design and execution of the evaluation of the work
- Clarity and presentation

= Submission =

Pre-submission of abstracts is a strict requirement. All papers and
abstracts have to be submitted electronically via
https://easychair.org/conferences/?conf=iswc2015research.

All research submissions must be in English, and no longer than 16 pages.
Papers that exceed this limit will be rejected without review. Submissions
must be in PDF formatted in the style of the Springer Publications format
for Lecture Notes in Computer Science (LNCS). For details on the LNCS
style, see Springer’s Author Instructions. ISWC-2015 submissions are not
anonymous.

We also encourage authors to include pointers to any additional material
that supports the scientific claims made in their papers (e.g., extended
technical reports, source code, datasets, links to applications).

Authors of accepted papers will be required to provide semantic annotations
for the abstract of their submission, which will be made available on the
conference web site. Details will be provided at the time of acceptance.

Accepted papers will be distributed to conference attendees and also
published by Springer in the printed conference proceedings, as part of the
Lecture Notes in Computer Science series. At least one author of each
accepted paper must register for the conference and present the paper there.

= Prior Publication And Multiple Submissions =

ISWC 2015 will 

Re: Looking for pedagogically useful data sets

2015-03-12 Thread Sarven Capadisli

On 2015-03-12 00:13, Paul Houle wrote:

Hello all,

   I am looking for some RDF data sets to use in a short presentation on
RDF and SPARQL.  I want to do a short demo,  and since RDF and SPARQL will
be new to this audience,  I was hoping for something where the predicates
would be easy to understand.

  I was hoping that the LOGD data from RPI/TWC would be suitable,  but
once I found the old web site (the new one is down) and manually fixed the
broken download link I found the predicates were like

http://data-gov.tw.rpi.edu/vocab/p/1525/v96

and the only documentation I could find for them (maybe I wasn't looking in
the right place) was that this predicate has an rdf:label of V96.)

Note that an alpha+numeric code is good enough for Wikidata and it is
certainly concise,  but I don't want :v96 to be the first things that these
people see.

Something I like about this particular data set is that it is about 1
million triples which is big enough to be interesting but also small enough
that I can load it in a few seconds,  so that performance issues are not a
distraction.

The vocabulary in DBpedia is closer to what I want (and if I write the
queries most of the distracting things about vocab are a non-issue) but
then data quality issues are the distraction.

So what I am looking for is something around 1 m triples in size (in terms
of order-of-magnitude) and where there are no distractions due to obtuse
vocabulary or data quality issues.  It would be exceptionally cool if there
were two data sets that fit the bill and I could load them into the triple
store together to demonstrate mashability

Any suggestions?




re: predicates would be easy to understand, whether the label is V96 
or some molecule, needless to say, it takes some level of familiarity 
with the data.


Perhaps something that's familiar to most people is Social Web data. I 
suggest looking at whatever is around VCard, FOAF, SIOC for instance. 
The giant portion in the LOD Cloud with the StatusNet nodes (in cyan) 
use FOAF and SIOC. (IIRC, unless GnuSocial is up to something else these 
days.)



If statistical LD is of interest, check out whatever is under 
http://270a.info/ (follow the VoIDs to respective dataspaces). You can 
reach close to 10k datasets there, with varying sizes. I think the best 
bet for something small enough is to pick one from the 
http://worldbank.270a.info/ dataspace e.g., GDP, mortality, education..


Or take an observation from somewhere, e.g:

http://ecb.270a.info/dataset/EXR/Q/ARS/EUR/SP00/A/2000-Q2

and follow-your-nose.

You can also approach from a graph exploration POV, e.g:

http://en.lodlive.it/?http://worldbank.270a.info/classification/country/CA

or a visualization, e.g., Sparkline (along the lines of how it was 
suggested by Edward Tufte):


http://stats.270a.info/sparkline

(JavaScript inside SVG building itself by poking at the SPARQL endpoint)

If you want to demonstrate what other type of things you can do with 
this data, consider something like:


http://stats.270a.info/analysis/worldbank:SP.DYN.IMRT.IN/transparency:CPI2011/year:2011

See also Oh Yeah? and so on..


Any way... as a starting point, social data/vocabs may be easier to get 
across, but then you always have to (IMHO) show some applications or 
visualizations for the data to bring the ideas back home.


-Sarven
http://csarven.ca/#i



Will or can academic publisher's accept submissions in HTML-and-friends?

2015-03-05 Thread Sarven Capadisli

Hi, I have a question:

Will or can academic publisher's accept submissions in HTML-and-friends?

It would be great to hear from our colleagues at academic publishing 
companies. Any and all of your responses are most welcome. That is, it 
does not have to be an official statement nor formal in any way. This is 
a friendly ping :)


Looking forward to your responses.

-Sarven
http://csarven.ca/#i



Re: linked open data and PDF

2015-02-03 Thread Sarven Capadisli

On 2015-01-30 16:48, Larry Masinter wrote:

  There are a number of issues
and shortcomings with the PDF approach which in the end will not play
well with the Web is intended to be, nor how it functions.


I think I have familiarity with what the Web is intended to
be, and how it functions, and I disagree.

One of the earliest advances of the web (from Tim's
original HTML-only design) was the introduction of
support for multiple formats, including image and
document representations.  It would greatly
improve the open data initiative to not restrict LD
to HTML.


No one is restricting LD to HTML. Evidently, that is not the case, nor 
should it be that way. FYI, RDF serializations lead the way in LD. But, 
for the human/end-user, all roads almost always lead to HTML.


The multiple formats are indeed supported, but their mileage varies on 
how we get a hold of them. We have HTML which tries to address their 
accessibility and discoverability. It is clear that PDFs are data-silos 
since we do not hop from one (binary document) to another. While linking 
is possible, at the end of the day, there is a UX problem. There is no 
ubiquitous experience which allows one to switch between PDF and HTML 
resources in a given device, operating-system, and software (e.g., Web 
browser, PDF reader). Jumping between them is awkward, and for the sake 
of what? How or why would that UX be any preferable for the user? 
Surely, that can be improved; as you well know that Web browsers can 
display PDFs nowadays. But still, that's just an annoyance (or depending 
on who you ask, it is a convenience).


Surely, you also know why timbl decided not to use TeX as the language 
to author and exchange documents on the Web.


I stand by my original point that HTML is a good bet. The burden of 
proof that PDF is somehow Web or LD friendly lies on the shoulders of 
enthusiasts and stake holders. Make is so.


This is not to discourage any format striving to be more open and 
machine-friendly on the Web.



Your other points:

not fault tolerant


What are kinds of faults you think should be tolerated
but are not? I looked through
  http://csarven.ca/enabling-accessible-knowledge
but I'm still not sure what you mean.


Open up a (La)TeX/Word/PDF file and remove a non-content character - I 
hope we don't have to debate about which character. Is the document 
still useful? What kind of error-handling is there in corresponding 
readers or anything that can make an HTTP call and display the response 
for the human? Compare that with HTML.



machine-friendly (regardless of what can be stuffed into XMP),


I think machine-friendly in LOD context,  means that there are
readily available tools to add, extract, manipulate.

And it should be possible to annotate any format that
is suitable.


With that line of reasoning it practically means anything is 
machine-friendly, and not to mention that it is something that we are 
striving for any way. For instance, an image of text is certainly 
machine-friendly if OCR can be applied to it, or that one can point 
their camera to some text on the wall and have it translate the words 
for you. But, I suspect that many would argue whether an image is 
machine-friendly or not in the LD context. Is there a fundamental 
difference between PDF and say a JPEG in context of LD? I'm ignorant on 
this matter as I have difficulty spotting that.



and will not scale.


This baffles me, what scaling do you have in mind?
I've worked with 2000-page PDF files, which, when
served from HTTP servers with range retrieval,
can incrementally display quickly. There may be
some performance goals specific to 'data'?


First, I'm not suggesting that PDF is not widely used in a (desktop) 
environment with pre-installed software, but rather that its access over 
the Web is not that great.


This relates to ease of creating, publishing, and maintaining PDF 
documents. If PDF had a strong case, I would argue that we'd see a 
different Web than the one we are using now.



At the end of the day, PDF is a silo-document,


There are hyperlinks in, and hyperlinks out. Embedding.
Except that HTML can be source-edited as text, I am
not sure what you mean by 'silo', then.


I've touched on data-silo earlier. Yes, certainly parts of PDF can be 
linked to or that it can link out, but again, how good and reliable is 
that UX across devices, OS, and viewers?



  and it is not a ubiquitous reading/interactive
experience in different devices.


More consistent than other choices by design. Perhaps
not as widely available as HTML and JPEG, but close.


I suppose we should define pixel accuracy, but I agree with you on 
consistency. I do not think that PDF is anywhere close to HTML's 
penetration across devices, but, if you have the numbers for that, I'd 
be happy to change my view on this particular point.



Keep in mind that, this will most likely treat the data
as a separate island, disassociated from the context in which it appears
in.



Re: How do you explore a SPARQL Endpoint?

2015-01-22 Thread Sarven Capadisli

On 2015-01-22 15:09, Juan Sequeda wrote:

Assume you are given a URL for a SPARQL endpoint. You have no idea what
data is being exposed.

What do you do to explore that endpoint? What queries do you write?

Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com



I suspect that the obligatory query that everyone is dying to know is to 
get a distinct count of subjects. /me mumbles..


More realistically:

* I would say that getting a sense of which vocabularies/ontologies are 
used is a good way to dive ib, and to come up with more 
specific/useful/interesting queries thereafter.


* Look up and see which VoID information is available. Related to above 
point, e.g., void:vocabulary.


* Check whether there is sufficient human-readable labels for the 
significant portion of the instances.


* Check triples pertaining to provenance.

* Check if there are sufficient interlinks to resources that presumably 
external to the domain in which the endpoint is at.



Sorry, I'm not going to write out SPARQL queries here. Need to preserve 
brain-cells for the remainder of the day.


-Sarven
http://csarven.ca/#i



Re: linked open data and PDF

2015-01-21 Thread Sarven Capadisli

On 2015-01-20 18:28, Larry Masinter wrote:

There's some background that you might find helpful
in the discussion.

PDF is now defined by ISO 32000.
PDF has profiles, including PDF/A-3
http://www.digitalpreservation.gov/formats/fdd/fdd000360.shtml
ISO 19005-3. PDF/A-3 defines how to add arbitrary
file attachments to PDF.

XMP http://en.wikipedia.org/wiki/Extensible_Metadata_Platform
is (as of 2012) also an ISO standard, ISO 16684-1, a
format-independent metadata representation
that uses a restricted RDF/XML framework, but
not arbitrary RDF/XML.

A design from scratch today might make different
choices, of course. But for those whose
goal is deployment and integration
with existing workflows, then reuse of what is widely
deployed seems like a path worth investigating.

And XMP is widely implemented not just for PDF but
also for images, as a way of extending metadata
beyond EXIF or IPTC.

Putting linked data in compact form (CSV, for example)
might makes sense, perhaps as a PDF/A-3 file attachment,
if a document is a carrier of tabular data.

Image formats like JPEG and PNG (for which there
is support for XMP) don't have a standard, uniform
way of attaching other files, though, so allowing
data (or a pointer to external data) in the XMP
would broaden the applicability.

In choosing how to make five star open data work
for file formats other than HTML, what other choices
are there?


I would argue that declarative programs are most suitable. Others may 
disagree. AFAIK, there is no single widely accepted view on this.


re: existing workflows, would you mind sharing your thoughts on how 
the 4th star, use URIs to denote things, so that people can point at 
your stuff, may be achieved? Say we have:


http://example.org/foo.pdf

and that we go with XMP out of the box, irrespective of the RDF 
serialization it embeds. How can the 3rd LD design principle, when 
someone looks up a URI, provide useful information, using the standards 
(RDF*, SPARQL), be satisfied?


Example: I want to discover the variables that are declared in the 
hypothesis of papers.


What would the PDF/XMP look like?

How can I extract the information (without breaking my head) using off 
the shelf *open* tools?



Sure, not all PDFs have good quality XMP metadata,
but not all HTML has quality RDFa or metadata either.


I can agree to that. We can also look at it this way: majority of the 
Web pages are essentially broken, yet, the Web somehow just works. 
How would/does a PDF look or work on the Web if there is a non-trivial 
byte off - never mind the XMP?


-Sarven
http://csarven.ca/#i



Re: linked open data and PDF

2015-01-19 Thread Sarven Capadisli

On 2015-01-19 21:20, Sarven Capadisli wrote:

Here is another paper: http://linked-reseach.270a.info/


Typo:

http://linked-research.270a.info/

:)




Re: linked open data and PDF

2015-01-19 Thread Sarven Capadisli

On 2015-01-19 20:36, Larry Masinter wrote:

I just joined this list. I’m looking to help improve the story for Linked Open 
Data in PDF, to lift PDF (and other formats) from one-star to five, perhaps 
using XMP. I’ve found a few hints in the mailing list archive here.
http://lists.w3.org/Archives/Public/public-lod/2014Oct/0169.html
but I’m still looking. Any clues, problem statements, sample sites?

Larry
--
http://larry.masinter.net



Hi Larry,

First off, I totally acknowledge your interest to improve the state of 
things for PDF.


I'm welcome to be proven wrong, but for the big picture, I don't 
believe that LaTeX/XMP/PDF is the way to go for LD-friendly - perhaps 
efforts for that better invested elsewhere. There are a number of issues 
and shortcomings with the PDF approach which in the end will not play 
well with the Web is intended to be, nor how it functions. Most 
importantly, it is not fault tolerant, machine-friendly (regardless of 
what can be stuffed into XMP), and will not scale. At the end of the 
day, PDF is a silo-document, its rendering is a resource-hog in 
different devices, and it is not a ubiquitous reading/interactive 
experience in different devices.


For XMP/PDF to work, I presume you are going to end up dealing with 
RDF/XML, and an appropriate interface for authors to mark their 
statements with. Keep in mind that, this will most likely treat the data 
as a separate island, disassociated from the context in which it appears in.



May I invite you to read:

http://csarven.ca/enabling-accessible-knowledge

It covers my position in sufficient depth - not intended to be overly 
technical, but rather covering the ground rules and ongoing work.


While you are at it, please do a quick print-view from your Web browser 
(preferably in Firefox) or print to PDF.


The RDF bits are visible here:

http://www.w3.org/2012/pyRdfa/extract?uri=http%3A%2F%2Fcsarven.ca%2Fenabling-accessible-knowledgerdfa_lite=falsevocab_expansion=falseembedded_rdf=truevalidate=yesspace_preserve=truevocab_cache_report=falsevocab_cache_bypass=false

I will spare you the details on what's going on there, unless you really 
want to know, but to put it in a nutshell: it covers statements dealing 
with sections, provenance, references/citations..


Here is another paper: http://linked-reseach.270a.info/ (which can just 
as well be a PDF - after all, PDF is just a view), which in addition to 
above, includes more atomic things like hypothesis, variables, workflows, ..



The work is based on Linked Research:

https://github.com/csarven/linked-research

If you are comfortable with your browser's developer toolbar, try 
changing the stylesheet lncs.css in head to acm.css.


There is a whole behavioural/interactive layer which I'll skip over now, 
but you can take a look at it if you fancy JavaScript.


As you may have already noticed, the HTML template is flexible enough 
for blog posts and papers - again, this is about separating the 
structure/content from the other layers: presentation, and behaviour.



Any feedback, questions, always welcome!

-Sarven
http://csarven.ca/#i



Re: [Ann] WebVOWL 0.3 - Visualize your ontology on the web

2014-12-22 Thread Sarven Capadisli
On 2014-12-22 13:04, Steffen Lohmann wrote:
 On 21.12.2014 04:11, Melvin Carvalho wrote:
 I would normally expect a ? in the query string however, rather than,
 # which I presume is the 1337 way to hide the ontology from the server.
 
 Good point. We discuss it and maybe change to the query identifier (?)
 instead of the fragment identifier (#) in the next version of WebVOWL.

Melvin makes a good point, but I don't think he was necessarily
suggesting that you should change the call.

If you only intended your application to trigger the IRI retrieval
process via JavaScript (and not letting it hit the server), switching to
query string is irrelevant. Difference being that it will then hit the
server with no practical purpose.

However, using the fragment on the other hand implies that the IRI that
follows the # is part of the document. If we look at the HTML source,
that is not the case, and presumably not the case if the base URL
returns an RDF representation either.

I would suggest that, either use ? and let the server trigger everything
(which is IMO the right thing to do here, and with simpler/better
caching possibilities), or stick to # and let JavaScript manage it all
(as is now).

-Sarven
http://csarven.ca/#i




Re: Debates of the European Parliament as LOD

2014-11-12 Thread Sarven Capadisli

On 2014-11-12 16:13, Martin Kaltenböck wrote:

A great approach here would be to tag/annotate the articles with EuroVoc
(http://eurovoc.europa.eu/)


Looks like the URIs end up only pointing to HTML documents e.g.:

curl -v -L -sI http://eurovoc.europa.eu/209392

-Sarven
http://csarven.ca/#i





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [WWW 2015] Call for Research Papers

2014-11-04 Thread Sarven Capadisli

On 2014-11-03 19:42, Aldo Gangemi wrote:

:):):)

I followed the recent thread on using web formats in scientific publishing, and 
I agree that WWW conferences should allow to eat our own dogfood somehow.
Unfortunately, this discussion started too late to be of any use for the next 
WWW2015.

Anyway, IW3C2 (the committee supervising WWW confs) is pretty aware of the 
discussion, and eventually, specially in the context of linked science, 
publishing data associated with papers, etc., things will start changing.

Also look at WWW2015 joint events, I’m pretty sure that there will be room for 
f2f discussion ;).

Aldo

PS did you know that George Takei (first to play lieutenant Sulu) is quite well 
connected with PR activities for Google? probably you’re closer to truth than 
expected ;)


Good news, everyone!

Aldo, thank you for the feedback :)

Will IW3C2 share some of its concerns (whether they are on the technical 
or social end) so that the enthusiastic among us can contribute?


In the spirit of the Web and its values, this is for us and we need to 
build it together.


Thanks,

-Sarven
http://csarven.ca/#i





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [WWW 2015] Call for Research Papers

2014-11-03 Thread Sarven Capadisli

On 2014-11-01 16:12, WWW2015 wrote:

All submitted papers must
* be formatted according to the ACM SIG Proceedings template
([4]http://www.acm.org/sigs/publications/proceedings-templates) with a
font size no smaller than 9pt
* be in PDF (make sure that the PDF can be viewed on any platform), and
formatted for US Letter size
* occupy no more than ten pages, including the abstract and appendices,
but excluding references.



It is the authors’ responsibility to ensure that their submissions
adhere strictly to the required format. Submissions that do not comply
with the above guidelines may be rejected without review.


Web: the final frontier. These are the voyages of The Web conference. 
Its continuing mission: to explore strange new evolutionary paths of the 
Web, to seek out new architectures and new user experiences, to boldly 
go where no PDF has gone before.


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Enhancing open data with identifiers

2014-10-31 Thread Sarven Capadisli

On 2014-10-31 11:33, Leigh Dodds wrote:

I thought I'd share a link to this UKODI/Thomson Reuters white paper
which was published today:

http://theodi.org/guides/data-identifiers-white-paper

Cheers,

L.



Please correct me if I'm mistaken, but here is what I see and understand:

There are two copies (two identifiers) for the white paper:

1. 
http://thomsonreuters.com/corporate/pdf/creating-value-with-identifiers-in-an-open-data-world.pdf


2. 
https://www.scribd.com/embeds/245019844/content?start_page=1amp;view_mode=slideshowamp;access_key=key-z4Ega9n1sxq3ai5A2MHsamp;show_recommendations=false


I think 2 is unnecessary (which is linked from theodi.org) and arguably 
unimportant if it becomes unavailable, however, in the mean time, it 
passes the authority of that copy to a third party: scribd.com (Scribd Inc.)


The document is in PDF. Is that the best way to promote important 
information on data identifiers while factoring in everything that the 
white paper discusses?


Thanks,

-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: How to model valid time of resource properties?

2014-10-13 Thread Sarven Capadisli

On 2014-10-13 13:54, Frans Knibbe | Geodan wrote:

Hello!

I wonder if a way of recording changes in properties of resources can be
recommended. Many resources in real life have properties that have a
time range of being valid. In some datasets, only the current (or most
recent) state of a resource is stored, but in many cases it is important
to keep track of the history of development of a resource.

An example:

:john_smith
 a foaf:person ;
 foaf:name John Smith ;

Let's say that on 2013-09-27 John Smith marries Betty Jones. John Smith
is still the same person, so it makes sense to extend the same resource,
not create a new version:

:john_smith
 a foaf:person ;
 foaf:name “John Smith” ;
 ex:marriedTo :betty_jones ;

How could I efficiently express the fact that the statement :john_smith
ex:marriedTo :betty_jones is valid from 2013-09-27? And if the couple
divorces, that the property has expired after a certain date? It would
be nice if the way of modelling makes it easy to request the most recent
state of a resource, any historical state, or a list of changes during a
time period.

A quick web scan on the subject revealed some interesting research
papers, but as far as I can tell all solutions need extensions of RDF
and/or SPARQL to work.

Perhaps this question is really about the ability to make statements
about a triple? Which is a problem for which no satisfactory solution
has been found yet?

Regards,

Frans


Hi Frans,

This is not a comprehensive answer on this topic, but you might want to 
take a look at PROV-O [1] (which can address validity and history of 
entities) and maybe even employ OA [2].


Capturing temporal dimension of linked data by Jindřich Mynarz is an 
excellent read [3].


[1] http://www.w3.org/TR/prov-o/
[2] http://www.openannotation.org/spec/core/
[3] 
http://blog.mynarz.net/2013/07/capturing-temporal-dimension-of-linked.html


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-08 Thread Sarven Capadisli

On 2014-10-07 15:44, Peter F. Patel-Schneider wrote:

Well, I remain totally unconvinced that any current HTML solution is as
good as the current PDF setup.  Certainly htlatex is not suitable.
There may be some way to get tex4ht to do better, but no one has
provided a solution. Sarven Capadisli sent me some HTML that looks much
better, but even on a math-light paper I could see a number of
glitches.  I haven't seen anything better than that.


Would you mind creating an issue for the glitches that you are experiencing?

https://github.com/csarven/linked-research/issues

Please mention your environment and the documents you've looked at. Also 
keep in mind the LNCS and ACM SIG authoring guidelines. The purpose of 
the LNCS and ACM CSS is to adhere to the authoring guidelines so that 
the the generated PDF file or print output looks as expected (within 
reason).


Much appreciated!

-Sarven
http://csarven.ca/#i




smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-08 Thread Sarven Capadisli

On 2014-10-08 14:10, Peter F. Patel-Schneider wrote:

Done.

The goal of a new paper-preparation and display system should, however,
be to be better than what is currently available.  Most HTML-based
solutions do not exploit the benefits of HTML, strangely enough.

Consider, for example, citation links.  They generally jump you to the
references section.  They should instead pop up the reference, as is
done in Wikipedia.

Similarly for links to figures.  Instead of blindly jumping to the
figure, they should do something better, perhaps popping up the figure
or, if the figure is already visible, just highlighting it.

I have put in both of these as issues.


Thanks a lot for the issues! Really great to have this feedback.

I have resolved and commented on some of those already, and will look at 
the rest very shortly.


I am all for improving the interaction as well. I'd like to state again 
that the development was so far focused on adhering to the LNCS/ACM 
guidelines, and improving the final PDF/print product. That is to get on 
reasonable grounds with the state of the art.


Moving on: I plan to bring in the interaction and framework to easily 
semantically enrich the document as well as the overall UX. I have some 
preliminary code in my dev branch, and will bring it forward, and would 
like feedback as well.


Thanks again and please continue to bring forward any issues or feature 
requests. Contributors are most welcome!


-Sarven
http://csarven.ca/#i




smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-08 Thread Sarven Capadisli

On 2014-10-08 15:14, Luca Matteis wrote:

Dear Sarven,

I really appreciate the work that you're doing with trying to style an
HTML page to look similar to the Latex templates. But there's so many
typesetting details that are not available in browsers, which means
you're going to do a lot of DOM hacking to be able to produce the same
quality typography that Latex is capable of. Latex will justify text,
automatically hyphenate, provide proper spacing, and other typesetting
features. Not to mention kerning. Kerning is a *huge* thing in
typography and with HTML you're stuck with creating a DOM element for
every single letter - yup you heard me right.

I think it would be super cool to create some sort of JavaScript
framework that would enable the same level of typography that Latex is
capable of, but you'll eventually hit some hard limitations and you'll
probably be stuck drawing on a canvas.

What are your ideas regarding these problems?


We do not have to have everything pixel perfect and comprehensive all up 
front. That is a common pitfall. Applying the Pareto principle is 
preferable.


LaTeX is great for what it is intended for! This was never in question. 
We are however looking at a bigger picture for Web Science communication 
and access. There will be far more concerns than the presentation layer 
alone.


As for your technical questions: we need to create issues or features, 
and more importantly, open discussions like in these threads, to better 
understand what the SW research community's needs are. So, please create 
an issue because what you raise is important to be looked into further. 
I do not have all the technical answers, even though I am very close to 
the world of typeface, typography, and book design :)


In any case, if it was possible in LaTeX, I hope it is not naive of me 
to say that it can be achieved (if not already) in HTML+CSS+JavaScript.


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-08 Thread Sarven Capadisli

On 2014-10-08 18:38, Kingsley Idehen wrote:

Sarven,

Linked Open Data dogfooding, re., issue tracking i.e., a 5-Star Linked
Open Data URI that identifies Github issue tracker for Linked Data Research:

[1]
http://linkeddata.uriburner.com/about/id/entity/https/github.com/csarven/linked-research/issues/4
-- Linked Open Data URI (basic entity description page)
[2] http://linkeddata.uriburner.com/c/8FDBH7 -- deeper follow-your-nose
over relations facets oriented entity description page
[3]
http://bit.ly/vapor-report-on-linked-data-uri-that-identifies-a-github-issue-re-linked-research-data
-- Vapor Report (re., Linked Open Data principles adherence) .


It's pretty cool that you can grab stuff out of GitHub issues, even 
comments!


Papers link to code and then to commits and issues. See also [1].

Even comments e.g., [2]. Or even in the direction of paper comments 
which can be integrated and picked right up from the page e.g., [3]. 
Just need to add +/-1 buttons and triplify the review ;) With WebID+ACL, 
we have the rest.


Do I have write access (via WebID?)to something like [4] ? e.g., 
deleting an older label or triple :)


[1] http://git2prov.org/
[2] 
https://linkeddata.uriburner.com/about/html/http/csarven.ca/call-for-linked-research
[3] 
https://linkeddata.uriburner.com/about/html/http/csarven.ca/sense-of-lsd-analysis%01comment_20140808164434
[4] 
http://linkeddata.uriburner.com/about/html/http://linkeddata.uriburner.com/about/id/entity/https/github.com/csarven/linked-research/issues/4


-Sarven



smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-07 Thread Sarven Capadisli

On 2014-10-07 11:39, Norman Gray wrote:

The original spark to the thread was a lament that SW and LD conferences don't 
mandate something XMLish for submissions because X(HT)ML is clearly better 
for... well ... dammit, it's Better.


Straw man argument. Please stop that now!

I will spell out the main proposal and purpose for you because it sounds 
like you are completely oblivious to them. Let me know if anything is 
unclear.


* Conferences on SW/LD research should encourage and allow submissions 
using the Web native technology stack (e.g., starting from HTML and 
friends for instance) alongside the existing requirements. As the 
required submission in PDF can be generated via HTML+CSS, those that 
wish to arrive at the PDF by their own means can still do so, meanwhile 
without asking or forcing the existing authorship or review process to 
change. It is backwards compatible. The underlying idea is to use our 
own technologies, not only for the sake of using them, but also to 
identify the pains as a precursor to raising the quality of the 
(Semantic) Web stack for scientific research publishing, discovery, and 
reuse. This is plain and simple dogfooding and it is important.


* There is an opportunity for granular data discovery, reuse, and 
machines to aid in reproducibility of scientific research. This goes 
completely beyond off the shelf metadata e.g., author, title, subject, 
or what you can stuff into LaTeX+Whatever, not to mention mangling 
around what's primarily intended for desktop and print, to squeeze in 
some Web in there. We are talking about making reasonable strides 
towards having scientific knowledge that is universally accessible 
on the Web. PDF and friends do not fit into that equation that well, 
however, no one is blocked from doing what they already do. Some of us 
would like to do a bit more than that to test things out so that we can 
collectively have more wins.


* There is also an opportunity to attract more funding and interest 
groups, if we can better assess the state of Web Science. This is 
simply due to the fact that we would be able to mine more useful 
information from existing research. Moreover, we can identify research 
areas of potential value better. It is to elevate the support that we 
can get from machines to excel and to do our work better. This is in 
contrast to what we can currently achieve with the existing workflow 
i.e., the current process is only concerned about making it easy for 
the author, reviewer, and publisher, and not about gleaning 
high-fidelity information.



A more modest goal, which is still valuable and _much_ more achievable, is to 
get at least some RDF out of submitted articles.  That practically means 
metadata, plus perhaps some document structure, plus, if you're keen and can 
get the authors to invest their effort, some argumentation.  That's available 
for free (and right now) from LaTeX authors, and available from XHTML authors 
depending on how hard it would be to get them to put @profile attribute in the 
right places.
That original lament has overlapped with a parallel lament that PDF is a 
dead-end format -- it's not 'webby'.  I believe that the demo in my earlier 
message undermines that claim as far as RDF goes.


Let me get this right: you are advocating that LaTeX + RDF/XML + 
whatever processes one has to go through, is a more sensible approach 
than HTML? If so, we have a different view on what creates a good UX.


It may come as news to you, but the SW/LD community is not in favour of 
authors using RDF/XML unless it is completely within some tool-chain 
left for machines to deal with. There are alternative RDF notations 
which are more preferable. You should look it up. The problem with your 
proposal is that, the author has to boggle their mind with two 
completely different syntaxes (LaTeX and RDF/XML), whereas the original 
proposal was to deal with one i.e., HTML. Styling is no more of an issue 
as the templates in the case of LaTeX is provided, and for HTML, I've 
made a modest PoC with:


https://github.com/csarven/linked-research

However, you are somehow completely oblivious to that even though it was 
mentioned several times now on this mailing list. No, it is not perfect, 
and yes it can be better. There are alternative solutions to achieve 
something along those lines with the same vision in mind, which area all 
okay too.


If this is not about coding, but rather using WYSIWYG editors or 
authoring/publication tools, have a look and try a few here or from a 
service near you:


* http://en.wikipedia.org/wiki/Comparison_of_HTML_editors

* http://en.wikipedia.org/wiki/List_of_content_management_systems

Or you know, take 30 seconds to create a WordPress account and another 
30 seconds to publish. Let me know if you still think that's 
insufficient or completely unreasonable / difficult for Web Science 
people to handle.


So, *do as you like, but do not prevent me* from doing encouraging the 
SW/LD 

Re: scientific publishing process (was Re: Cost and access)

2014-10-06 Thread Sarven Capadisli

On 2014-10-06 06:59, Ivan Herman wrote:
 Of course, I could expect a Web technology related crows to use HTML 
source editing directly but the experience by Daniel and myself with the 
World Wide Web conference(!) is that people do not want to do that. 
(Researchers in, say, Web Search have proven to be unable or unwilling 
to edit HTML source. It was a real surprise...). Ie, the authoring tool 
offers are still limited.


Can you please elaborate on that? When was that and what tools were 
available or used? Do you have any documentation on the landscape from 
that time that we can use or learn from?


My understanding is that, you've experienced some issues about a decade 
ago and your reasoning is clouded by that. Do you think that it would be 
fair to revisit the situation based on today's landscape and see how it 
will play out?


From my perspective, we should have a bit more faith in the SW 
community because then we might actually strive to deliver, as opposed 
to walking away from the problem.


Like I said in my previous emails, (which I'm sure you've read), the 
current workshops on SW/LD research publishing did not deliver. Why do 
you have so much faith for waiting out and hope that they will deliver? 
They might, and I hope they do. But, I'm not putting all my chips on 
that option alone. I would rather see grass-roots efforts in parallel 
e.g., http://csarven.ca/call-for-linked-research


What's the number of human hours on CfP on Linked Science + Semantic 
Publishing so far? How was the delivery of machine and human-friendly 
research changed or evolved? What's visible or countable? On that front, 
what can we do right now that wasn't possible 5-10 years ago?


In the meantime, if the conferences, workshops can get back on track and 
motivate people (at least), we would not only see more value drawn out 
of the SW research, but also growing funding opportunities, and faster 
progress across the field.


I am disappointed by the fact that instead of addressing the core issue 
can the conferences allow or encourage the Web stack? we are 
discussing distractions e.g., perfection in authoring tools. Every user 
has their own preferences i.e., some will code, some will use tool X. 
What you are suggesting is that, lets wait it out because the 
developments may reveal the perfect authorship tooling. If that was ever 
the case, we'd see it in the general market, not something that might 
one day emerge out of SW/LD workshops.


I will bet that if the requirements evolve towards Webby submissions, 
within 3-5 years time, we'd see a notable change in how we collect, 
document and mine scientific research in SW. This is not just being 
hopeful. I believe that if all of the newcomers into the (academic) 
research scene start from HTML (and friends) instead of LaTeX/Word (and 
friends), we wouldn't be having this discussion. If the newcomes are 
told to deal with LaTeX/Word (regardless of hand coding or using a 
WYSIWYG editor) today, they are going to do exactly that. That basically 
pushes the date further for complete switch over to Webby tools because 
majority of those researchers would have to be flushed out of the 
system, before the next wave of Webby users can have their chance.


Even if we have all of the perfect or appropriate tooling (which I think 
is the wrong thing to aim for) right now, it will still take a few years 
to flush out or have the current LaTeX/Word users to evolve. I would 
rather see the smallest change happen right now than nothing at all.


*AGAIN*, technology is not the problem. #DIY

-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Publication of scientific research

2014-10-04 Thread Sarven Capadisli

On 2013-04-25 15:42, Daniel Schwabe wrote:

Sarven and all,
I don't have the answers to your questions. But I find it interesting that we 
could at least do a survey with authors. But we would really have to at least 
mention some *reasonable* tools that are available, otherwise I'm afraid their 
positions won't change from before.
I will discuss this within IW3C2 and see if we can include a question about 
this in one of the pre- or post- WWW conferences surveys.
In  the meantime, perhaps SWSA (who promotes ISWC) might want to follow up on 
this idea as well.
Cheers
D


Hi Daniel,

If you have any follow-up information on that, would you mind sharing?

Sorry to bring this up a year and a half later, but I'm still interested.

Thanks,

-Sarven



On Apr 25, 2013, at 10:29  - 25/04/13, Sarven Capadisli i...@csarven.ca wrote:


On 04/24/2013 09:39 PM, Daniel Schwabe wrote:

Some years ago, IW3C2, which promotes the WWW conference, and  of which I am a 
member, is very interested in furthering the use of Web standards, for all the 
reasons that have already been mentioned in this discussion, decided to ask 
authors to submit papers in (X)Html. After all, WWW is a *Web* conference! 
(This was before RDF and its associated tools were available.)
The bottom line was that authors REFUSED do submit in this format, partly 
because of lack of tools, partly because they were just comfortable with the 
existing tools. There were so many that it would have simply ruined the 
conference if the organization simply refused these submissions.
The objection was so strong that IW3C2 eventually had to change its mind, and 
keep it they way it was, and currently is.
Clearly, for some specialized communities, certain alternative formats may be 
acceptable - ontologies, in the context of sepublica, make perfect sense as an 
acceptable submission format. But when dealing with a more general audience, I 
do not believe we have the power to FORCE people to adopt any single 
specialized format - as everything else, these things emerge from a community 
consensus over time, even if first spearheaded by a smaller core group.
Before that happens, we need to have a very clear value proposition and, most 
of all, good tools for people to accept and change. Most people will not change 
their ways is not convinced that it's worth the additional effort - and having 
really good tools is a sine qua non requirement for this.
On the other hand, efforts continue to at least provide metadata in RDF, which 
has been surprisingly harder to produce year after year without requiring hand 
coding and customization each time. But we will get there, I hope.
Just my 2c...


Hi Daniel, thank you for that invaluable background.

I'll ask the community: what is the real lesson from this and how can we 
improve?

What's more important: keeping the conference running or some ideals?

Was that reaction from authors expected? Will it ever be different?

What would have happened if IW3C2 stood at its place? What would happen if 
conferences take a stand - where will authors migrate?

What would be the short and long term consequences?

Not that I challenge this, but are we sure that it is the lack of good tools 
that's holding things back? What would make the authors happy? Was there a 
survey on this?

-Sarven




Daniel Schwabe  Dept. de Informatica, PUC-Rio
Tel:+55-21-3527 1500 r. 4356R. M. de S. Vicente, 225
Fax: +55-21-3527 1530   Rio de Janeiro, RJ 22453-900, Brasil
http://www.inf.puc-rio.br/~dschwabe












smime.p7s
Description: S/MIME Cryptographic Signature


Re: scientific publishing process (was Re: Cost and access)

2014-10-04 Thread Sarven Capadisli

On 2014-10-04 04:14, Daniel Schwabe wrote:

As is often the case on the Internet, this discussion gives me a terrible sense 
of dejá vu. We've had this discussion many times before.
Some years back the IW3C2 (the steering committee for the WWW conference 
series, of which I am part) first tried to require HTML for the WWW conference 
paper submissions, then was forced to make it optional because authors simply 
refused to write in HTML, and eventually dropped it because NO ONE (ok, very 
very few hardy souls) actually sent in HTML submissions.
Our conclusion at the time was that the tools simply were not there, and it was 
too much of a PITA for people to produce HTML instead of using the text editors 
they are used to. Things don't seem to have changed much since.


Hi Daniel, here is my long reply as usual and I hope you'll give it a 
shot :)


I've offered *a* solution that is compatible with the existing workflow 
without asking for any extra work from the OC/PCs, with the exception 
that the Web-native technologies for the submissions are officially 
encouraged. They will get their PDF in the end to cater the existing 
pipeline. In the meantime, the community retains higher quality research 
documents.



And this is simply looking at formatting the pages, never mind the whole issue of 
actually producing hypertext (ie., turning the article's text into linked hypertext), 
beyond the easily automated ones (e.g., links to authors, references to papers, etc..). 
Producing good hypertext, and consuming it, is much harder than writing plain text. And 
most authors are not trained in producing this kind of content. Making this actually 
semantic in some sense is still, in my view, a research topic, not a routine 
reality.
Until we have robust tools that make it as easy for authors to write papers 
with the advantages afforded by PDF, without its shortcomings, I do not see 
this changing.


I disagree that we don't have sufficient or robust tools to author and 
publish web pages. I find it ironic that we are still debating on this 
issue as if we are in the early-mid 90s. Or ignoring [2], or the 
possibility to use a service which offers [3] to publish a (pardon me 
for saying) but a friggin' web page.


If it is about coding, I find it unreasonable or unprofessional to 
think that a Computer/Web Scientist in 2014 that's publicly funded to do 
their academic endeavors is incapable of groking HTML. But, somehow 
LaTeX is presumed to be okay for the new post-graduate that's coming in. 
Really? Or is the real reason that no one is asking them to do otherwise?


They can randomly pick a WYSIWYG editor tool or an existing publishing 
service. No one is forcing anyone to hand-code anything. Just as no one 
is forced to hand code LaTeX.


We have the tools and even services to help us do all of that. Both from 
and outside of SW. We had them for a long time. What was lacking was a 
continuous green light to use them. That light stopped flashing as 
you've mentioned.


But again, our core problems are not technical in nature.


I would love to see experiments (e.g., certain workshops) to try it out before 
making this a requirement for whole conferences.


I disagree. The fact that workshops or tracks on linked science or 
semantic publishing didn't deliver is a clear sign that they have the 
wrong process at the root. When those workshops ask for submissions to 
be in PDF, that's the definition of irony. There is no useful 
machine-friendly research objects! Opportunity lost at every single CfP.


Yet, we eloquently describe hypothetical systems or tools that will one 
day do all the magic for us instead of taking a good look at what's 
right in front of us.


So, lets talk about putting the cart before the horse. A lot of time and 
energy (e.g., public funding) that could have been better used simply by 
actually *having the data*. And, then figuring out how to utilize that. 
There is no data, so what's there to analyze or learn from? Some 
research trying to figure out what to do with trivial and limited 
metadata e.g., title, abstract, authors, subjects? Is 
data.semanticweb.org (dog food) the best we can show for our 
dogfooding ability?


I can't search/query for research knowledge on topic T, that used 
variables X, Y, which implemented a workflow step S, that's cited by or 
used those exact parameters, that happens to use the datasets that I'm 
planning to use in my research.


Reproducibility: 0
Comparability: 0
Discovery: 0
Reuse: 0
H-Index: +1?


Bernadette's suggestions are a good step in this direction, although I suspect 
it is going to be harder than it looks (again, I'd love to be proven wrong ;-)).


Nothing is stopping us from doing things in parallel and we are in fact. 
Close-by efforts from workshops to force11, public-dwbp-wg, 
public-digipub-ig, .. to recommendations e.g., PROV-O, OPMW, SIO, SPAR, 
besides the whole SW/LD stack, which benefits scientific research 
communication and 

Re: Publication of scientific research

2014-10-04 Thread Sarven Capadisli

On 2013-04-29 19:29, Andrea Splendiani wrote:

Hi,

ok. Let's see if we can offer xhtml+RDFa as an additional format, and see how 
people react. I'll spread the idea a bit.

best,
Andrea


Hi Andrea,

Care to share the feedback that you've received?

Thanks,

-Sarven


Il giorno 27/apr/2013, alle ore 23:05, Sarven Capadisli i...@csarven.ca ha 
scritto:


On 04/27/2013 02:31 AM, Andrea Splendiani wrote:

I'm involved in the organization of a couple of conferences and workshops.
You do need a template, as without this it's hard to have homogenous 
submissions (even for simple things as page, or html equivalent, length).
Other than this, the main issue I could see is that proceedings may require pdf 
anyway.
But we could make a partial step and ask for abstract in xhtml+rdfa, to be 
included in the online program, and full papers in xthml+RDFa as an optional 
submission.
It's a small step, but a first step.
I'm very short on time, but if there is a template, I can see if the idea finds 
some interest.
I have to admit that asking for PDFs sounds a bit retro!


Challenge accepted!

Here is the egg in XHTML+RDFa with CSS for LNCS and ACM SIG:

https://github.com/csarven/linked-research

See live:

http://linked-research.270a.info/

Will you provide the chicken? :D

I got it as close to LNCS template as possible for now. ACM SIG is on its way - 
(change stylesheet from lncs.css to acmsig.css to see current state).

I know it is not perfect, however I think it is a decent start. Best viewed in 
Firefox/Chrom(e|ium). Print to paper, or PDF from your browser to see 
alternative views. What do you think?

Now, before some of you pixel-perfectionist folks yell at me, please first 
chillax and create an issue at GitHub. Or better yet, contribute with pull 
requests. It is using Apache License 2.0. But, all feedback is most welcome!

I still don't think this is main challenge. We need go aheads from 
conferences, then we can hack up the best templates and stylesheets that this universe 
has ever seen.

-Sarven











smime.p7s
Description: S/MIME Cryptographic Signature


Visibility of the data (was Re: Formats and icing)

2014-10-04 Thread Sarven Capadisli

On 2014-10-02 00:48, Sarven Capadisli wrote:

On 2014-10-01 21:51, Kingsley Idehen wrote:

On 10/1/14 2:42 PM, Sarven Capadisli wrote:

can't use them along with schema.org.

I favour plain HTML+CSS+RDFa to get things going e.g.:

https://github.com/csarven/linked-research


What about:

HTML+CSS+(RDFa | Microdata | JSON-LD | TURTLE) ?

Basically, we have to get to:

HTML+CSS+(Any RDF Notation) .


Sure, why not!


Actually, I'd like to make a brief comment on this. While I agree with 
(and enjoy) your eloquent explanations on RDF, Languages, and Notations, 
and that any RDF Notation is entirely reasonable (because we can go 
from one to another at relative ease), we shouldn't overlook one 
important dimension:


*Visibility* of the data.

Perhaps this is left better as a best practice than anything else, but 
in my opinion:


RDFa is ideal when dealing with HTML for research knowledge because if 
applied correctly, it will declare all of the visible portions of the 
research process and knowledge. It is to make the information available 
as first-class data as opposed to metadata. It is less likely to be left 
behind or go stale because it is visible to the human at all times.


This is in contrast to JSON-LD or Turtle where they will be treated as 
dark metadata, or at least create duplicate information subject to 
desynchronize. While JSON-LD and Turtle have their strengths, they are 
unnecessary when concerning the most relevant parts of the document 
which is already visible, e.g., concepts, hypothesis, methodological 
steps, variables, figures, tables, evaluation, conclusions.


Again, this is not meant to force anyone to use a particular RDF 
notation. Getting HTML+CSS in the picture is a huge win itself as far as 
I'm concerned :) Then applying RDF notation is a nice reasonable step 
forward.


* I am conveniently leaving out Microdata from this discussion because I 
don't feel it is still relevant.


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Cost and access (Was Re: [ESWC 2015] First Call for Paper)

2014-10-03 Thread Sarven Capadisli

On 2014-10-02 13:50, John Domingue wrote:

As well as being irritating, UK academics submitting to ESWC run the
risk that their papers will not be open to REF submission; even if they
are, we have to go to additional efforts to ensure they are green OA
published. This is also true of ISWC which makes the semantic web a
pretty unattractive area to do research in.


for both ISWC and ESWC the PDFs are freely available e.g. see [1]

John

[1] http://2014.eswc-conferences.org/program/accepted-papers


It is great that some agreements between the conferences and the 
publishers allow open access e.g., [1].


However, lets not forget that:

1) a good chunk of publicly funded research is produced and reviewed for 
free, meanwhile:


2) the public still ends up paying for the research submissions i.e., 
institutions pay their fees to subscribe to the periodicals from the 
publisher.


So, not only are we working for free, we are paying again for the 
research that we've produced. And all meanwhile, insisting on making it 
easier and preferable by the publisher.


Having said that, there is no need to pile on the publisher. After all, 
they have a business and the intuitions are willing to pay for their 
services and products. That's okay.


Many in the SW field are interested in discovering the research output 
at great precision, without having to go through the publisher, or 
having to use a common search engine to look for keywords endlessly for 
something mildly relevant. We are all in fact working towards that 
universal access of information - I think TimBL said a few things on 
that silly little topic. IMO, this is where it comes apparent that the 
level of openness that's offered by the publisher is superficial and 
archaic.


The SW community can do much better by removing the unnecessary controls 
that are in place to control the flow of information. This is 
whereabouts we should wake up. :)


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Cost and access (Was Re: [ESWC 2015] First Call for Paper)

2014-10-03 Thread Sarven Capadisli

On 2014-10-03 13:36, Eric Prud'hommeaux wrote:

Let's work through the requirements and a plausible migration plan. We need:


Agreed. In favour of taking action.

Just to separate and emphasize on the issues. The original request was 
merely:


Will you consider encouraging the use of Semantic Web / Linked Data 
technologies for Extended Semantic Web Conference paper submissions?


or

Will you compromise on the submission such that the submissions can be 
in PDF and/or in HTML(+RDFa)?


This, in my view, attempts to retain the existing workflow. There is 
nothing here that tries to solve everything (as some misinterpret or 
paint it as such). Incremental actions are preferable than throwing our 
hands into the air and running away frantically from the problem that 
the community brought it onto itself.


This is about creating awareness and embracing Web-native technologies 
for SW research submissions, provided that the final presentation (i.e., 
in PDF) complies with the requested template, which is passed to the 
publisher in the end.


Just to elaborate on that, while the submissions in the end may only be 
in PDF (although, it would be great to work it out without that, but one 
step at a time right?), the fact that the submission line acknowledges 
the importance and flexibility in creating, sharing, and preserving 
research knowledge using the technologies in what the conference is all 
about, should not be underestimated.


As a plus, authors that are on their way to going from, say HTML+CSS to 
PDF, have the opportunity and willingness to make their research 
contributions publicly accessible under a Web space that they control. 
The source method to represent this information sets the tone for the 
rest of the phases. That is, if LaTeX/Word is source, then it is extra 
work to get HTML out of that, and many would not and do not (in fact) 
bother. However, if HTML is source (for instance), then we retain that 
possibility. All meanwhile that the publisher gets their PDF (e.g., via 
HTML+CSS to print file), as well as authors fulfilling their 
academic/research requirements.


Moving on:


1 persistent storage: it's hard to beat books for a feeling of
persistence. Contracts with trusted archival institutions can help but
we might also want some assurances that the protocols and formats will
persist as well. It would be possible to have a fallback contract with a
conventional publisher but it's hard to see what's in it for them if
they have to paper print everything or migrate to a new format when the
Web loses way to something else. Maybe it's more pragmatic to forgoe
these assurances of persistence and just hope that economic interests
protect the valuable stuff.



This is out of my area, but as I understand it, going from digital 
source to print is just a view or materializing of said knowledge. 
History has shown that, both, PDF and HTML are sufficient for storage.


Those that wish to archive via PDF can do so. It is just a view after 
all. However, that one particular view to store knowledge need not set 
the tone for everything else. I think the tool-chain around HTML/XML 
tries to lift those restrictions. For instance, with HTML we are free to 
create any suitable presentation for any device with CSS.



2 impact factor: i have the impression that conventional publishers have
a bit of a monopoly and and sudden disruption would be hard to engineer.
How do to get leading researchers to devote their work in some new
crackpot e-journal to the exclusion of other articles which will earn
them more points towards tenure and grants? Perhaps the answer is slowly
build the impact factor; perhaps it's some sort of revolution in the
minds of administrators and funders.


I'd like to be optimistic about this and entertain the idea that, either 
the current journals evolve or a new line of journals will seek, embrace 
and truly employ the scientific method with the aid of available 
technologies. At this time, it is difficult to solely rely on 
human-only peer reviews, because it is time consuming and error-prone. 
If reviewers have the opportunity to better investigate, by raising the 
support that's available from machines, the truthfulness and 
reproducibility of given research can be better verified.


We are certainly heading in that direction with all the work that goes 
on in SW and other fields. The bottleneck is that, right now, it is not 
seriously given the light of day, or even tested out. When SW/LD 
conferences resist to come to terms with supporting their own 
fundamentals or visions towards research submissions, how is what we 
currently have any desirable?


Just to be clear, the original proposal is not for all of sciences to 
adopt. It is for international semantic web conferences. That's the 
minimal step we can take.


So, I agree, some revolution, or maybe just evolution on the idea of 
putting our own technologies to test will contribute towards increasing 
the impact factor 

SW/LD researchers groking LaTeX and HTML (Was vs Re: [ESWC 2015] First Call for Paper)

2014-10-02 Thread Sarven Capadisli

On 2014-10-01 22:32, Pablo N. Mendes wrote:

Or at least is it as easy yo write this HTML as it is to write in LaTeX?


If a SW/LD computer scientist researcher can manage to deal with 
LaTeX, would it be presumptuous to say that they can probably manage HTML?


If a non-computer scientist can get a Web page up or use an existing 
bloging software/service to publish some information, do you think that 
the average SW/LD will be able to cope with that? Or are we asking for 
too much from the SW/LD researcher here?


At this point, we are not even talking about putting RDF information in 
some POSH. Lets try to get the SW/LD research community to catch up to 
20 years ago.


In the end, some people will write code, some people will use a WYSIWYG 
editor of their liking. Those that wish to use an existing tooling, can 
probably pick one at random here:


http://en.wikipedia.org/wiki/Comparison_of_HTML_editors

Or they can take a minute to install or use a service that supports one 
at random here:


http://en.wikipedia.org/wiki/List_of_content_management_systems

But, to answer your question, I think that, if a Web Science researcher 
can figure out to write \paragraph, \par, \begin, carriage-return, or 
whatever ... (and, I'm totally making a shot in the dark, super wild 
guess here), I think they can figure out p/p.


What do you think?

-Sarven
http://csarven.ca/#i




smime.p7s
Description: S/MIME Cryptographic Signature


Re: Technical challenges (Was Re: [ESWC 2015] First Call for Paper)

2014-10-02 Thread Sarven Capadisli

On 2014-10-02 01:48, Pablo N. Mendes wrote:

I never claimed it was not social. But as it is often the case, it may
be rooted in or (falsely) justified by technical misinformation.


You are right, that is of course likely. However, based on the feedback, 
the bottleneck is about OC passing on the orders as they were instructed 
from the publisher. Quite a bit of SW/LD OCs also feel that PDF is the 
appropriate way to proceed today, and even tomorrow. There of course OCs 
that do feel and encourage SW/LD tool stack is the right way to go, but, 
we can clearly see what the end result is.


Let me put it this way: even workshops on Linked Science or Semantic 
Publishing are advocating the use of PDF for paper submissions. Surely, 
they know the potentials of the technologies that they are behind.


So, I don't think that all these people are uninformed or do not believe 
or understand the technologies, but that they are simply following as 
things always have been, without daring to rock the boat for potentially 
something better.


OCs of major SW/LD conferences want to insure that things stay on course 
i.e., 1) the conference makes money to survive or plus, 2) to get 
sufficient paper submissions and lowering the barrier for researchers.


This is all about obedience, laziness, and carelessness. Technical 
(mis)information is not even in the radar at this point.


Having said that, that's all okay and at the same time irrelevant. If 
the authors can follow something along the lines of:


http://csarven.ca/call-for-linked-research

we should see notable changes.


Sorry for having missed the discussions that addressed these concerns.
Thanks for reiterating, because others may have missed them too. Perhaps
we should go even further in breaking this down to the inattentive list
member. You may get better bang for your buck if you distill the info
into one/two-sentence questions and answers. The past threads you point
out may be a great starting point. By giving pointers to concise and
objective FAQs with examples that remove all doubt about technical
issues, even the busiest PC chair will have no excuses to not accept
those submissions. Others may even feel compelled to encourage them.


The goal is to try to get conferences, supervisors and authors all on 
board to do their share.



I, for one, plan to organize a workshop next year, and would like to
encourage submissions in Webby formats.


That's great to hear! Thank you. Please do share the details of your 
workshop when possible or interested in any feedback.


-Sarven
http://csarven.ca/#i




smime.p7s
Description: S/MIME Cryptographic Signature


Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)

2014-10-02 Thread Sarven Capadisli

On 2014-10-02 10:36, Ghislain Atemezing wrote:

On 01/10/2014 21:55, Luca Matteis wrote:

But why is it backwards? We have different formats serving different
purposes. Diversity is healthy. Simply because PDF is not in the Web
stack it doesn't make it Web-unfriendly.


In 2013, PDF was mentioned during ODW2013 [0] workshop and I quote part
of the final report [1] below regarding PDF:

(...) PDF - often referred to as the format where data goes to die. In
the open data world, PDF has a bad name as it is not deemed machine
processable. As Adobe's Jim King pointed out in his presentation [2] ,
this is perhaps unfair. PDF can include structured tables, can carry
associated metadata, extractable text and more. It is the way that PDFs
are generated - using basic tools that don't support all the features -
that renders PDF documents opaque to machine processes.

This could be an opportunity to work closer with Adobe's folks to see
how web stack can help process data in PDF...

Best,
Ghislain

[0] http://www.w3.org/2013/04/odw/
[1] http://www.w3.org/2013/04/odw/report
[2] http://www.w3.org/2013/04/odw/Role_of_PDF_and_Opendata_final.pdf


Thanks for sharing Ghislain.

Lets not forget that we have SW/LD supporters that go after public 
institutions to aim for 5-star Linked Data. Or ask for public funding to 
support their SW/LD research.


Ironic Facts:

* Majority of the SW/LD research output is publicly funded
* Majority of the SW/LD research venues promote 1-star Linked Data

So, yes, we can do a lot of different things and in fact, a lot of 
people are doing different things to improve open science and communication.


The question is, what efforts are the SW/LD research venues making? How 
are they compromising or improving the state of things? What has changed 
in recent memory?


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: SW/LD researchers groking LaTeX and HTML (Was vs Re: [ESWC 2015] First Call for Paper)

2014-10-02 Thread Sarven Capadisli

On 2014-10-02 11:08, Luca Matteis wrote:

On Thu, Oct 2, 2014 at 10:48 AM, Sarven Capadisli i...@csarven.ca wrote:

If a SW/LD computer scientist researcher can manage to deal with LaTeX,
would it be presumptuous to say that they can probably manage HTML?

If a non-computer scientist can get a Web page up or use an existing bloging
software/service to publish some information, do you think that the average
SW/LD will be able to cope with that? Or are we asking for too much from the
SW/LD researcher here?


You're again comparing two different technologies that are used for
different things!


I was addressing an issue on working with these technologies. It had 
nothing to do with the quality or the purpose of the technologies in 
comparison. So, why paint it as such?



HTML works great for the Web because it's
lightweight. Latex works great for papers that need to end up in a
physical journal because it has better facilities for that.

Imagine a Latex person coming to you asking why doesn't the Web
community embrace Latex as a format for rendering web pages.

HTML can't and won't work for everything. So let's not presume it will
and let's find solutions such as Latex RDF formats or even trying to
embed RDF statements in PDF documents to me seems like a sensible
idea.

RDF can be expressed in Turtle, JSON-LD, HTML (RDFa), XML (RDF/XML)...
adding PDF or Latex to the group seems like a logical thing to do.


How often do SW/LD researchers pick up print journals as opposed to PDF 
files and then maybe run to their printers?


How many HTML documents did you look at in the past 24 hours to consume 
information in comparison to PDF?


So, it sounds like you are advocating the importance of print quality 
over everything else in the picture.


HTML+CSS may not solve all of our problems, but I think they do 
remarkable job in advancing human knowledge. The question is, what is 
the priority for SW/LD research conferences? To comply with whatever 
works best for publishers, or to raise the level of scientific 
communication and discovery in its own field?


IMHO, we have our fundamental priorities mixed up. And, that's clouded 
by the fact that there is some technologies and tooling that works best 
for publishers and print output (at least at this time).


You can have the last say on this thread. I am primarily interested in 
hearing SW/LD research venues making a compromise to help SW/LD research 
go forward.


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [ESWC 2015] First Call for Paper

2014-10-01 Thread Sarven Capadisli

On 2014-10-01 13:36, Mauro Dragoni wrote:

Papers should not exceed fifteen (15) pages in length and must be
formatted according to the guidelines for LNCS authors. Papers must be
submitted in PDF (Adobe's Portable Document Format) format.


As I understand it, there is a disconnect between the submission format 
and what ESWC wishes to achieve or encourage [1].


Can someone please elaborate on how forcing researchers to use PDF to 
share their publicly funded knowledge instead of SW/LD technologies and 
tools better fulfills [1], or perhaps even contributes towards the 
Semantic Web vision?


I would like to better discover and use SW research knowledge. ESWC 
encouraging and promoting PDF for knowledge sharing sets an unnecessary 
limit on discovery and use.


Will you consider encouraging the use of Semantic Web / Linked Data 
technologies for Extended Semantic Web Conference paper submissions?


Thanks,

[1] http://2015.eswc-conferences.org/about-eswc2015

-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [ESWC 2015] First Call for Paper

2014-10-01 Thread Sarven Capadisli

On 2014-10-01 18:12, Fabien Gandon wrote:

Dear Saven,


Thank your for your response Fabien.


The scientific articles are presenting scientific achievements in a format that 
is suitable for human consumption.
Documents in a portable format remain the best way to do that for a conference 
today.


I acknowledge the current state of matters for sharing scientific 
knowledge. However, the concern was whether ESWC was willing to promote 
Web native technologies for sharing knowledge, as opposed to solely 
insisting on Adobe's PDF, a desktop native technology.


If my memory serves me correctly, the Web took off not because of PDF, 
but due to plain old simple HTML. You know just as well that HTML was 
intended for scientific knowledge sharing at large scale, for human as 
well as machine consumption.



However:
- all the metadata of the conference are published as linked data e.g.
   http://data.semanticweb.org/conference/eswc/2014/html


This is great. But, don't you think that we can and ought to do better 
than just metadata?



- authors are encouraged to publish, the datasets and algorithms they use in 
their research on the Web following its standards.


I think we all know too well that this is something left as optional 
that very few follow-up. There is no reproducibility police in SW/LD 
venues. Simply put, we can't honestly reproduce the research because all 
of the important atomic components that are discussed in the papers 
e.g., from hypothesis, variables, to conclusions, are not precisely 
identified or easily discoverable. Most of the time, one has to hunt 
down the authors for that information. IMHO, this severely limits 
scientific progress on Web Science.


Will you compromise on the submission such that the submissions can be 
in PDF and/or in HTML(+RDFa)?


Thanks again for considering.

-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Formats and icing (Was Re: [ESWC 2015] First Call for Paper)

2014-10-01 Thread Sarven Capadisli

On 2014-10-01 19:10, Laura Dawson wrote:

What about EPUB, which is xHTML and has support for Schema.org markup? It
also provides for fixed-layout.


IMO, this particular discussion is not what we should be focusing on. 
And, it almost always deters from the main topic. There are a number of 
ways to get to Web friendly representations and presentations. EPUB? 
Sure. Whatever floats the author's boat. As long as we can precisely 
identify and be able to discover the items in research papers, that's 
all fine.


I personally don't find the need to set any hard limitations on (X)HTML 
or which vocabularies to use. So, schema.org is not granular enough at 
this time. There are more appropriate ones out there e.g: e.g., 
http://lists.w3.org/Archives/Public/public-lod/2014Jul/0179.html , but 
that doesn't mean that we can't use them along with schema.org.


I favour plain HTML+CSS+RDFa to get things going e.g.:

https://github.com/csarven/linked-research

(I will not dwell on the use of SVG, MathML, JavaScript etc. at this 
point, but you get the picture).


The primary focus right now is to have SW/LD venues compromise i.e., not 
insist only on Adobe's PDF, but welcome Web native technologies.


Debating on which Doctype or vocabulary or whatever is like the icing on 
the cake. Can we first bring the flour into our kitchen?


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)

2014-10-01 Thread Sarven Capadisli

On 2014-10-01 21:05, Luca Matteis wrote:

Dear Sarven,

This stuff is really cool: http://linked-research.270a.info/

Couple of questions: How did you come up with such a close CSS/HTML
template as the LCNS latex version? Did you hand code the CSS to make
it look as close as possible or was it automated by some tool I'm not
aware of?


As venues always give precise instructions on what template to follow e.g.:

http://static.springer.com/sgw/documents/1121537/application/pdf/SPLNPROC+Author+Instructions_Aug2014.pdf

that's exactly what I did. Read it line by line and wrote the CSS for it.

There is no doubt that the CSS can be better. Different browsers for 
instance have varying CSS3 print support.


If you thought http://linked-research.270a.info/ looked cool, why not 
change the link href=lncs.css to acm.css from your browser's 
developer tool.


What's demanded by the conferences/publishers is an archaic 
presentation. Fixed page length. Fixed view. So be it. That is a small 
subset of what we can achieve using the Web stack.



What you're saying about moving towards RDFa for publishing papers
should definitely be discussed more, however, CSS/HTML still fails in
a lot of things that Latex on the other hand excels at. For example
typography and font kerning/spacing. All that works really well in
latex/pdf, while in HTML you get different results in different
browsers. Journals certainly can't expect inconsistencies. I've seen
templates built in PDF using Latex that you can dream of using
HTML/CSS. It's just a better set of tools for when it comes to
publishing *static* documents, because they were built for static
documents. The Web on the other hand is rarely static. It's an
interactive playground better suited for a DOM structure such as HTML.


Let me ask you to take a step back for a second. Are you convinced that 
there are far more possibilities with LaTeX/PDF for data representation, 
presentation and interaction than HTML+CSS+JavaScript+RDFa+SVG+MathML.. 
? Do we really need to battle that out? :) Don't worry, I will. As I'll 
demonstrate in my final PhD dissertation ;)


If PDF was so good at static documents, we'd have the Web of PDFs 
instead of Web of HTMLs. I disagree that the Web is rarely static.


As far as the print precision goes, I agree, CSS3 and browser support 
for printing has a lot of work to do. But, what level of precision is 
the SW/LD conferences are worried about providing to publishers? I 
recall that Springer for instance asks for only the PDF. What that 
practically means it that, one can go from LaTeX, HTML+CSS, or dare I 
say, JPEG to PDF. There is no precision police for rendering. Most 
people and organizations have printers that have 300-600 DPI support.



Isn't there just a standard way to add RDF markup to a PDF file?


Maybe. But, that's totally backwards, IMO.

-Sarven




smime.p7s
Description: S/MIME Cryptographic Signature


Technical challenges (Was Re: [ESWC 2015] First Call for Paper)

2014-10-01 Thread Sarven Capadisli

On 2014-10-01 22:32, Pablo N. Mendes wrote:


It may help to preemptively address concerns here. Does anyone have a
HTML+CSS(+RDFa) template that looks exactly like the LNCS-formatted
PDFs? Can we show that papers using this template:
- look consistent with each other (follow the LNCS typesetting instructions)
- look the same as the PDF counterparts
- look the same in any reader
- look the same on screen and printed
- can be read both online and offline
- have the same or smaller file size
- make it easy to share with others (all in one file?)

Can LaTeX to HTML be achieved easily with this template? Or at least is
it as easy yo write this HTML as it is to write in LaTeX?

I feel like this thread warrants a manifesto with a backing github
repo where everybody interested can chip in.


The core of your concerns were addressed over the past few years in 
different ways on this mailing list. When some posed the situation as a 
technological problem, I've created some templates and LNCS and ACM 
styles:


https://github.com/csarven/linked-research

Reached out to OCs, supervisors, and authors. They all have a part in 
this. Even wrote manifestos:


* http://csarven.ca/linked-research

* http://csarven.ca/call-for-linked-research


How about we try to solve a different problem? The one that I've posed: 
will SW/LD conferences encourage the community to eat their own dogfood 
for papers? We can certainly improve on whatever needs to be improved 
over time. The problem is that, if SW/LD technologies are not even 
welcome to share scientific knowledge at these conferences, it is 
irrelevant to worry about the technological comparisons.


We have a Social Problem 101. Period.

-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper)

2014-10-01 Thread Sarven Capadisli

On 2014-10-01 21:51, Kingsley Idehen wrote:

On 10/1/14 2:42 PM, Sarven Capadisli wrote:

can't use them along with schema.org.

I favour plain HTML+CSS+RDFa to get things going e.g.:

https://github.com/csarven/linked-research


What about:

HTML+CSS+(RDFa | Microdata | JSON-LD | TURTLE) ?

Basically, we have to get to:

HTML+CSS+(Any RDF Notation) .


Sure, why not!


The above is possible because we now have standardization of link/,
script/ etc.. in HTML that makes this possible.

I wouldn't single out RDFa in this quest.

History has shown that whenever we single anything out anything, at the
notation level, we inevitably open up a new format centric war. These
wars simply protract all the confusion that swirls around RDF :)


I agree and that's all fine. I've only proposed one particular solution 
that made the most sense to me. Going from one to another is not an 
issue either. People are going to do whatever is convenient or suitable 
for them in the end (just like LaTeX-PDF).


The primary problem is not about solving x in HTML+CSS+x, but that 
HTML+CSS is not even an option to begin with for major international 
Semantic Web conferences to better preserve and foster smart 
identification and discovery of research components.  Reproducibility 
suffers along the way. There is absolutely nothing worthwhile we can 
query for from past SW/LD research.


-Sarven



smime.p7s
Description: S/MIME Cryptographic Signature


Re: LD4KD CFP - Linked Data for Knowledge Discovery Workshop at ECML/PKDD

2014-09-19 Thread Sarven Capadisli

On 2014-04-11 10:33, Mathieu d'Aquin wrote:

*SUBMISSIONS*

Articles should be written following the Springer LNCS template (see
authors instructions at
http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0) and can be
up to 10 pages in lenght for research papers or 5 pages for position
papers, including figures and references.

Submissions are exclusively admitted electronically, in PDF format,
through the EasyChair system. The submission site is
https://www.easychair.org/conferences/?conf=ld4kd


IMHO, the fundamental problem with this workshop is that it is promoting 
PDF for knowledge discovery.


The information within the PDF submissions are not Linked Data friendly, 
and hence the spirit of the workshop is not put to its full potential.


If the workshop wants to promote the Linked Data design principles 
and/or the technology stack for knowledge discovery, it should be 
responsible enough to try to publish the knowledge that it itself 
gathers in a Linked Data manner.


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: LD4KD CFP - Linked Data for Knowledge Discovery Workshop at ECML/PKDD

2014-09-19 Thread Sarven Capadisli

On 2014-09-19 14:27, Mathieu d'Aquin wrote:

Hi,

[I responded on twitter in more or less the same line, so this is just a
short summary]


Thanks for your response!


I see that you are asking similar questions to many others, so I
probably won't add much here. Basically, my answer is: I agree with the
principles you are trying to put forward, but I don't think it is such
an obvious thing to do. Workshops are for exchange between people. If
the process of engaging with a workshop is made more complicated, it is
kind of defeating the purpose. Also, this sort of things are happening,
slowly, but it is a process which obviously would rather be bootstrapped
in higher impact venues.


I tend to consider all levels of information exchange sufficiently 
valuable that they can make an impact what we are all working towards. 
But, I guess we can both agree that there is no need to debate on this 
particular matter.



Finally, I see that you are part of the organising committee of
SemStats, which as far as I can tell does also promotes PDF as the
(only) publication format - maybe what you could do to help, in addition
to sending emails to mailing lists, would be to show us the way?


SemStats, like some other workshops e.g., COLD, accepts (X)HTML+RDFa 
documents as long as they are consistent with their presentations in 
parallel to those in PDF. This is to encourage the use of LD/Web 
friendly technology stack (as opposed to the PDF, which is 
fundamentally intended for the desktop or print). Where and how the 
publication of the proceedings are conducted is entirely orthogonal to 
having source of the research documents as machine-friendly.


While I am not personally in favour of seeing PDFs flying around about 
Linked Data, I do acknowledge the middle-ground given the state of the 
publishing workflow. I think the middle-ground for the time-being is 
that, research documents can be first published using the Web stack in 
an honest manner, and at the same time, create a PDF copy of that 
document (which is essentially created for free) to fulfill the 
(hopefully) temporary requirements of the research venues.


See also: http://csarven.ca/call-for-linked-research

The question is, will the research venues, like the ones you are 
investing a lot time towards allow Web friendly submissions in 
addition to the PDF? Or will they continue to insist submissions only in 
PDF? What steps or responsibilities should the research venues about 
Linked Data take on in order to come slightly closer to the 
SemWeb/LinkedData vision?


Let me stress here again (as I do in almost every conversation) that, we 
are not dealing a technology problem here. It is absurd to even think 
that the Linked Data folks can not cope anything but LaTeX or their 
WYSIWYG editors. It is equally silly to think that there isn't 
sufficient or adequate tooling to essentially publish Web pages.


https://github.com/csarven/linked-research is only one modest way of 
dealing with these technological problems.


There is valuable information in the public funded research on Linked 
Data that's being put forward, yet, we are defaulting to PDF, why? 
Because the publisher said so? The true publisher of the research 
documents are the authors, not the organizations that are laughing their 
way to the bank. ;)



Thanks,
Mathieu.

ps: I usually don't argue on mailing lists, and will most of the time
miss emails to a mailing list even if they are directly addressed to me,
so consider this message an exception.


Acknowledged. My intention is not to argue, but to have an open 
discussion. If this or similar mailing lists are not for that, I find it 
rather awkward to use this mailing list for the sole purpose of 
receiving CfP's in PDF about Linked Data ;)


Thanks again,

-Sarven
http://csarven.ca/#i




smime.p7s
Description: S/MIME Cryptographic Signature


Deadline Extended (Re: SemStats 2014 Call for Challenge)

2014-09-05 Thread Sarven Capadisli

On 2014-08-05 09:48, Sarven Capadisli wrote:

SemStats 2014 Call for Challenge


Second International Workshop on Semantic Statistics (SemStats 2014)
Workshop website: http://semstats.org/
Event hashtags: #SemStats #ISWC2014

in conjunction with

ISWC 2014
The 13th International Semantic Web Conference
Riva del Garda - Trentino, Italy, October 19-23, 2014
http://iswc2014.semanticweb.org/



The deadline for participants to submit their short papers and
application is Sun 7th September, 2014, 23:59pm Hawai Time.


http://semstats2014.wordpress.com/2014/08/05/semstats-2014-call-for-challenge/

Deadline extended to 2014-09-30 \o/

-Sarven



Re: CfP: VISUAL workshop @ EKAW

2014-09-04 Thread Sarven Capadisli

On 2014-09-04 11:36, Steffen Lohmann wrote:

Submission Guidelines
==

Paper submission and reviewing for this workshop will be electronic via
EasyChair. The papers should be written in English, following Springer
LNCS format, and be submitted in PDF.


I am struggling to understand the role that PDF plays towards Linked 
Science.


Would you mind helping me understand:

* What is Linked Science?

* How does PDF (better?) contribute towards fulfilling the Linked 
Science promise in comparison to the alternatives methods?


* At what granularity is the information in the papers that's submitted 
to this Linked Science workshop preserved? Which information is not? 
And, most importantly, which information should be preserved for future 
research(ers)? What was your decision process?



Thanks,

-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Updated LOD Cloud Diagram - what is the message?

2014-08-18 Thread Sarven Capadisli

On 2014-08-18 11:06, Christian Bizer wrote:

So we don’t plan to push a specific message with the diagram, but I
agree with you that the release of the diagram could be a good occasion
for the community to discuss the possible messages/conclusions that one
could draw from it and I would be happy if more people would comment on
this.


Even if there is no explicit message, there is one due to the diagram's 
history. The current diagram that's about to be released is not a 
continuation of the 2011 diagram. However, it comes across as such, 
since the diagram has the same presentation. I am not making a case 
whether one or the other is more appropriate for the purpose it is 
trying to fulfil, but that there is a clear distinction between the 
information underneath and that deserves extra attention.



Personally, I think it is quite interesting to compare the deployment of
Microdata/RDFa/Microformats and Linked Data on the Web. We also
investigated the deployment of Microdata/RDFa/Microformats  [1][2] and
the comparison currently looks like this:


Why list RDFa along with Micro*? More importantly, why remove it from 
the other first-class Linked Data?



Thus, it makes sense that we see Linked Data adoption within communities
that have an interest in making their data easy to use and thus are
willing to invest effort into this, like libraries, government and
science (with life science and language processing being the first
communities adopting the technologies) and social networking.


You do not truly believe that, do you? If you thought that Linked Data 
was a worthwhile effort, you would be delivering the research/science 
behind the updated diagram document as such in a machine-friendly manner.


The question is, why are you not publishing your science using the 
available Linked Data stack?



This are my two cents to the overall discussion and I would be very
happy to hear what others think about the message that can be drawn from
the new diagram.


http://lists.w3.org/Archives/Public/public-lod/2014Jul/0143.html awaits 
your kind reply.


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Linked SDMX Data

2014-08-15 Thread Sarven Capadisli

On 2014-08-11 22:58, Gannon Dick wrote:

Sorry for the x-post


Don't be. It is a natural thing.


Hi Sarven,

I noticed you used GeoNames for the Australian Bureau of Statistics Linked 
Data hack mentioned below.  GeoNames does much useful work ... but everyone in the 
Linked Data business could use a little help.

Domains - in theory, the countries of the world are a group of (federalized 
data set of ...) (groups of) Court Houses, Jurisdictions, keyed with two and 
three letter acronyms (ISO 3166).  This set for all practical purposes is a 
Unicode Code Page, but instead of (16x4)=256 members there are (169x4)=676 
Latin Alphabet Capital Letters.  Statistical metrics at the domain level are 
manipulated with Linear Algebra and Linear Programming. Diacritics (Côte 
d'Ivoire) or alternate forms (Ivory Coast) do nothing semantically useful, the 
acronym is the leveler.

So, I rewrote the GeoName table (http://www.geonames.org/countries/) to be:
1) Unicode compliant for XML (HTML entities are HEX escaped)
2) The Geo's, Country Profiles, whatever are local links.  I left those as is 
and included/matched the MARC System / US Library of Congress Linked Data 
Service URI's (http://id.loc.gov/vocabulary/countries.html).
3) Finally, I used an SQL RDB to do an Outer Join on the Code Set - all 676 possibilities.  Adding a three 
character code synonym does not increase the code page size.  It is then possible to split this 
registry into lists of codes 1) Present, 2) Missing and 3) Slack (in the Linear 
Programming usage).
4) Put the files in (FODS - (Flat XML) Open Document Spreadsheets format) so 
that European Civil Servants can not whine about data quality (got your back, 
DERI, you too ABS).

http://www.rustprivacy.org/2014/balance/gts/geonames_domains.zip

Unfortunately, when RDF Lists of Place Names are filtered through previously 
written applications the result is often unhelpful additions, however these 
steps should ameliorate the problem significantly.

--Gannon


Thanks Gannon. If I understand correctly, you got around to implement 
your suggestion back in 2012Q1:


http://lists.w3.org/Archives/Public/public-lod/2012Mar/0108.html

Care to clarify what I should make of:

http://www.rustprivacy.org/2012/urn-lex/artificial-bureaucracy.html

?

-Sarven
http://csarven.ca/#i




smime.p7s
Description: S/MIME Cryptographic Signature


Re: Linked SDMX Data

2014-08-11 Thread Sarven Capadisli

On 2014-08-05 12:08, Sarven Capadisli wrote:

On 2014-04-23 15:31, Sarven Capadisli wrote:

On 2014-04-22 14:18, Sarven Capadisli wrote:

On 2013-08-08 15:17, Sarven Capadisli wrote:

On 03/08/2013 01:04 PM, Sarven Capadisli wrote:

On 02/15/2013 02:42 PM, Sarven Capadisli wrote:

Ahoy hoy,

OECD Linked Data:
http://oecd.270a.info/

BFS Linked Data:
http://bfs.270a.info/

FAO Linked Data:
http://fao.270a.info/

Linked SDMX Data:
http://csarven.ca/linked-sdmx-data


ECB Linked Data:
http://ecb.270a.info/


IMF Linked Data:
http://imf.270a.info/


UIS Linked Data:
http://uis.270a.info/


FRB Linked Data:
http://frb.270a.info/


BIS Linked Data:
http://bis.270a.info/


ABS Linked Data:
http://abs.270a.info/

-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: SemStats 2014 Call for Challenge

2014-08-07 Thread Sarven Capadisli

On 2014-08-06 21:17, Gannon Dick wrote:

Hi Sarven,

The US Census Bureau has piles of data available via an API.

http://www.census.gov/developers/

You need a key.  I do not see anything in the terms of service about citizenship
http://www.census.gov/data/developers/about/terms-of-service.html

I think the key motivation is to avoid DOS attacks.  The US Census Bureau is 
known for it's uncharacteristic moments of lucidity with respect to Government Work.

Economic Data release in the US has always been a sensitive subject.  The 
problems arose about a decade before the European Union, oops, I meant the 
Congress of Vienna ;-)
The story is here: http://www.census.gov/prod/2003pubs/conmono2.pdf
This document also explains in detail exactly what they mean by Statistical 
Safeguards for non-disclosure of confidential business information.

--Gannon


Hi Gannon, thank you for the heads-up on US Census Bureau data and summary.

The challenge is intended to be flexible about the statistical datasets 
which can be used. If anyone would like to use US Census Bureau data or 
some other Census data by legal and reusable means, they are welcome to 
do so for the Census track. Same holds true for the Open track with the 
restriction on Census data being lifted. We have merely highlighted only 
some of the datasets/dataspaces out there as examples.


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


SemStats 2014 Call for Challenge

2014-08-05 Thread Sarven Capadisli

SemStats 2014 Call for Challenge


Second International Workshop on Semantic Statistics (SemStats 2014)
Workshop website: http://semstats.org/
Event hashtags: #SemStats #ISWC2014

in conjunction with

ISWC 2014
The 13th International Semantic Web Conference
Riva del Garda - Trentino, Italy, October 19-23, 2014
http://iswc2014.semanticweb.org/

Summary
---
The SemStats Challenge is back with more action! It is organized in the 
context of the SemStats 2014 workshop. Participants are invited to apply 
statistical techniques and semantic web technologies within one of two 
possible tracks, namely the Census Data Track and Open Track. Following 
up on the success of last year's Challenge, this year, the Census Data 
Track will have data from France, Italy, and Ireland. We would also like 
to introduce the new Open Track, where any type of statistical data of 
your choice may be used in the challenge.


The challenge will consist in the realization of mashups or 
visualizations, but also on comparisons, analytics, alignment and 
enrichment of the data and concepts involved in statistical data (see 
below for the data made available and additional requirements).


The deadline for participants to submit their short papers and 
application is Sun 7th September, 2014, 23:59pm Hawai Time.


It is strongly suggested to all challenge participants to send contact 
informations to semstats2...@easychair.org in order to be kept informed 
in case of any changes in the data provided. For any questions on the 
challenge, please contact semstats2...@easychair.org.


Census Data Track
-
We would like to point you to plenty of raw data. The conversion process 
will be considered as part of the challenge.


* Istat (Italian National Institute of Statistics) offers Census 1991, 
2001, 2011 data and metadata: 
http://www.istat.it/it/archivio/104317#variabili_censuarie (See 
Variabili censuarie / Censimento della popolazione e delle 
abitazioni), which gives the population count by age range and sex at a 
very detailed geographic level.


* INSEE (National Institute of Statistics and Economic Studies) can 
provide different things:


1. Detailed results for Census 2011: 
http://insee.fr/fr/themes/detail.asp?reg_id=0amp;ref_id=fd-RP2011amp;page=fichiers_detail/RP2011/telechargement.htm 
giving results on individuals only at the region level but with a great 
number of other variables (see 
http://insee.fr/fr/ppp/bases-de-donnees/fichiers_detail/RP2011/doc/contenu_RP2011_INDREG.pdf)


2. Detailed results for Census 2010: 
http://insee.fr/fr/themes/detail.asp?reg_id=0amp;ref_id=fd-RP2010amp;page=fichiers_detail/RP2010/telechargement.htm 
with, for example, results on individuals at a smaller geographic level


3. Key figures for Census 2011 on different themes at the 
municipality level: 
http://insee.fr/fr/bases-de-donnees/default.asp?page=recensement/resultats/2011/donnees-detaillees-recensement-2011.htm


* ABS (Australian Bureau of Statistics) offers Census 2011 data at 
http://stat.abs.gov.au/ . Data that is in particularly of interest to 
this challenge can be found by navigating to: Social Statistics gt; 
2011 Census of Population and Housing gt; Time Series Profiles (Local 
Government Areas) gt; T03 Age by Sex (LGA)


* CSO (Central Statistics Office) Ireland's Census 2011 data and 
metadata available as Linked Data: http://data.cso.ie/.


* You are welcome to use any other Census data whether it is Linked Data 
based or not


Open Track
--
There is one essential requirement for the Open Track: papers must 
describe a publicly available application. We would love to see everyone 
play and learn from what you have created. You are welcome to use any 
statistical data whether it is already in Linked Data shape or not! 
While you are at it, why not combine it with data from other domains?


Here are some statistical linked data spaces:

* http://270a.info/
* http://eurostat.linked-statistics.org/
* http://data.cso.ie/
* http://linked-statistics.gr/
* http://linkedspending.aksw.org/
* http://cedar-project.nl/
* http://datahub.io/ - Whatever is here ;)



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Linked SDMX Data

2014-08-05 Thread Sarven Capadisli

On 2014-04-23 15:31, Sarven Capadisli wrote:

On 2014-04-22 14:18, Sarven Capadisli wrote:

On 2013-08-08 15:17, Sarven Capadisli wrote:

On 03/08/2013 01:04 PM, Sarven Capadisli wrote:

On 02/15/2013 02:42 PM, Sarven Capadisli wrote:

Ahoy hoy,

OECD Linked Data:
http://oecd.270a.info/

BFS Linked Data:
http://bfs.270a.info/

FAO Linked Data:
http://fao.270a.info/

Linked SDMX Data:
http://csarven.ca/linked-sdmx-data


ECB Linked Data:
http://ecb.270a.info/


IMF Linked Data:
http://imf.270a.info/


UIS Linked Data:
http://uis.270a.info/


FRB Linked Data:
http://frb.270a.info/


BIS Linked Data:
http://bis.270a.info/

-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Call for Linked Research

2014-08-02 Thread Sarven Capadisli

On 2014-07-28 16:58, Spencer Tom Tafadzwa Chirume wrote:

Awesome initiative. It would help to have examples to point to for
reuse. You have this on on GitHub?


Sure. Just to be clear, again: this whole thing is not a technical 
problem we are dealing with. *It would be senseless and a complete waste 
of time to point out that publishing on the Web using Web native 
technologies and tools is successful*. Any discussion on whether Linked 
Data researchers can manage to put a document up on the Web or not is 
insulting to begin with.


So, there are a lot of ways to realize it. There is nothing really new 
here. It can be as simple as making blog posts somewhere. Add more 
semantics to the document as you go to identify and capture the 
essentials. It really doesn't matter what the starting point is as long 
as it uses technologies and standards that were designed with the Web in 
mind.


For this particular task, I prefer the HTML+RDFa+CSS route. It has the 
advantage of showing something to human users and having it machine 
friendly at the same time using a single document. Everything that has 
to do with the research paper can go in there. It can of course point 
at the resources that it talks about. As for making it look pretty for 
print or formal submissions for conferences/workshops, it can be 
styled with CSS. And, that part is already done (waiting for the LD 
researchers to pick it up) for LNCS, ACM, and a thesis like styles:


https://github.com/csarven/linked-research

Print view the example URLs. In fact, this:

http://csarven.ca/call-for-linked-research

is in LNCS.


Copy it. Kindly send pull requests.


-Sarven
http://csarven.ca/#i






smime.p7s
Description: S/MIME Cryptographic Signature


Re: Call for Linked Research

2014-08-02 Thread Sarven Capadisli

On 2014-07-29 23:35, Ivan Shmakov wrote:

Sarven Capadisli i...@csarven.ca writes:
On 2014-07-29 09:43, Andrea Perego wrote:


   You might consider including in your call an explicit reference to
   nanopublications [1] as an example of how to address point (5).

   About source code, there's a project, SciForge [1], working on the
   idea of making scientific software citable.

   My two cents...

   [1] http://nanopub.org/
   [2] 
http://www.gfz-potsdam.de/en/research/organizational-units/technology-transfer-centres/cegit/projects/sciforge/

   Thanks for the heads-up, Andrea.  The article on my site has an open
   comment system, which is intended to have an open discussion or have
   suggestions for the others (like the ones you've proposed).  Not that
   I'm opposed to continuing the discussion here, but you are welcome to
   contribute there so that the next person that comes along can get a
   hold of that information.

Not that I have much to say on the subject itself, but I’d like
to note that, to my mind, a major issue with “on-site” comments
is that there are rarely any standard way to “mirror” them
somewhere else.

Alas, Web sites come and go (and the Internet Archive cannot
always be relied upon), while mailing list messages survive in
the subscribers’ email archives, – and at times, could be
downloaded via NNTP from Gmane just as well.

[…]



Lets contrast on site comments with:

1. Blind reviews.

2. Reviews that do not even get to see the light of day.

So, I see feedback of all sorts (including reviews) with people's name 
attached to them, where anyone can read and foster discussion is an 
improvement over the current state of things.


I would even favour reviews held out in the open in a public mailing list.

But, what we see now is no reviewer names and rejections (which is by 
far the greatest portion of submitted research). The usually leads 
authors to hold back on their work for further improvements (even if it 
was good work to begin with but happened to not meet some artificial 
cut off point) or retries at other venues. I have a hard time thinking 
that rejected reviews from 1-3 anonymous reviewers is better than 
putting the work out there to get a broader sense on the quality of the 
work. Those same reviews can still be conducted as on site comments 
with those same reviewers, in addition to everyone else that's 
interested in the work.


Any way, there are venues that do this already out in the open e.g., 
Semantic Web Journal, and I think that's great.


Archiving reviews/comments is an important, but orthogonal issue here.

-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Call for Linked Research

2014-07-30 Thread Sarven Capadisli

On 2014-07-30 09:25, Giovanni Tummarello wrote:


Like I said, it is mind-boggling to think that the SW/LD research
community is stuck on 1-star Linked Data. Is that sinking in yet?


So Sarvem let us be rational and pick Occam's razor style simplest
explanation

Could we all be stupid or could it be instead an indicator of how little
relevant LD is perceived to be (at this task, as for many others)?


I think the most obvious and simple reason is:

The publishing industry dictates how things should be written down. 
Conferences pass on the orders. Authors follow-up.


It just turned out to be that PDF is the first-class format for the 
publishers. If it was something else, authors would do that. So, no, it 
is not due to the shortcoming of LD as you put it - that's a more 
complicated possible explanation.



If there was a pain that LD could really solve people would have used it
without anyone calling for it. The mith of put the data and it wil be
useful was nice to believe in for a while (early LOD efforts, 2007 or
so ) but has - quite unfortunately -  disolved long since then.


You are misinterpreting my central argument.

The primary argument is not even about LD. It actually centers around 
authors taking control of their own work at all steps. The 
recommendation to use LD is just one way to foster intelligent knowledge 
preservation and discovery.


I would even go one step back say that whatever is Web-native e.g., an 
ordinary HTML page, is still an orders of magnitude improvement over PDF 
(which is a desktop-native format). Do not tell me that PDF is a better 
way to disseminate knowledge on the Web than HTML. With HTML, there is 
at least one path towards LD. That option is open if people want to take 
it. If PDF was the better way, we'd see using LaTeX/PDF to create their 
Web pages.


The Occam's razor: authors are not adopting LD because they are told to 
use something else.



All this while things like schema.org http://schema.org have exploded
on the web and form what is a real semantic web of marked up pages. (but
quite sadly, not a mention of this in this mailing list).


A lot of things happen outside of SW/LD mailing list. Computer Science, 
for instance. What's your point?


Off topic: When schema.org was put together, it had a history to look 
at. Smart folks behind it came up with a solution that would work for 
many. The industry was ready for it. Lets not forget all the efforts 
that tried to capture and discovery information on ordinary web pages 
e.g., Yahoo! SearchMonkey, microformats.


I am not trying to sell a vocabulary here or some exact how-to list on 
how you should construct a web page or LD paper. Start somewhere that's 
Web-friendly and make sure that you have total control over your 
words. No one is stopping anyone from passing a PDF copy to the 
conference along the way. We are able to do this now.



my2c If you want to have some impact then your best bet by order of
magnitudes is to liase with Dan Brickley (Google) to have some more
specific schema.org http://schema.org markup for scientific experiments.


I do not think that effort is the best way to use our energy right now. 
The problem is not I can't put my research on a Web page because there 
is no nice clean vocabularies for me and everyone else to use. The 
problem is Can we get authors to start publishing their research on 
their own? As I've said numerous times, this is not a technology 
problem. I think we have a long list of tools and services e.g., 
Wordpress?, and plenty of great vocabularies (coming out of this 
community), and technical means to do it ourself.


So, no, I completely reject your argument on why things are the way they 
are is due to LD's shortcomings.


The L(O)D efforts are successful - from GYM's search/knowledge graphs, 
to government/public making reasonable efforts to get their stuff up in 
a Web-friendly way i.e., unchaining themselves from PDF/Word etc. They 
are on board (some taking longer than the others, but still). There is 
good and noticeable progress. It is the LD research community that's 
stuck, and that's embarrassing! So, what is your real argument?


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Call for Linked Research

2014-07-29 Thread Sarven Capadisli

On 2014-07-29 09:08, Andrea Splendiani wrote:

I agree with you.
When I hear about research I tend to think about Life Sciences research (my bias). In 
this context, I see a lot of advocacy for reproducibility of research. But when I see what people 
do, for most of them, data elaboration is only one part of the process. So I was wondering, in this 
context, how relevant publication is on the reproducibility of results.
Probably in proportion not much.

Otherwise you are right. In a more computational/engineering oriented areas, 
where reproducibility only depends on information... information should be 
there.


I do not think there is a need to give different treatments to the types 
of science - is there? (It is an honest question, I am not an expert in 
this area). I figure that something either follows the scientific 
method, or it doesn't.


If how something is published hinders researchers from taking a closer 
look at what was done, should that not be a concern?


If a Web researcher can not deliver their work in full, it would be 
irresponsible to think that it is up to interested parties to get it in 
full. It either exists as complete as possible, or it just doesn't. And, 
frankly, if some research information can not be obtained or reproduced 
easily, it does not need to be cited. Otherwise, I do not see how that 
qualifies as science.


So, when publicly funded research gets summarized in a PDF, accessible 
through the publisher's site, and all of the dots to reproduce the work 
is unavailable at ease, then are we using the term Web Science in an 
honest way?


PDF is a desktop native document. It is true that all sorts of stuff can 
be jammed inside. Its supporters are trying to make sure that it plays 
well with the Web. While those are good efforts (making the best out of 
the situation), it is like placing a band-aid on a severe wound. PDF on 
and off the Web breaks good user-experience patterns no matter how you 
look at it. One can not simply navigate through a piece of research 
publication to all of its atomic parts in an ubiquitous fashion. PDF can 
not deliver that. It is more likely that your off the shelf Web browser 
can. Supporters of PDF will come out and say that, well, it can have 
hyperlinks, or it can even render in some Web browser. Do they really 
think that navigating between a PDF document (whether on desktop or 
viewed in a Web browser) and a hypermedia resource is considered good 
design?


Why shouldn't we remove all barriers to reproducibility?

-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Call for Linked Research

2014-07-29 Thread Sarven Capadisli

On 2014-07-29 09:43, Andrea Perego wrote:

You might consider including in your call an explicit reference to
nanopublications [1] as an example of how to address point (5).

About source code, there's a project, SciForge [1], working on the
idea of making scientific software citable.

My two cents...



[1]http://nanopub.org/
[2]http://www.gfz-potsdam.de/en/research/organizational-units/technology-transfer-centres/cegit/projects/sciforge/


Thanks for the heads-up, Andrea. The article on my site has an open 
comment system, which is intended to have an open discussion or have 
suggestions for the others (like the ones you've proposed). Not that I'm 
opposed to continuing the discussion here, but you are welcome to 
contribute there so that the next person that comes along can get a hold 
of that information.


It wasn't my intention to refer to all workshops that play nicely 
towards open science, vocabularies to use, exact tooling to use, or all 
efforts out there e.g., nanopublications.


You have just cited two hyperlinks in that email. Those URLs are 
accessible by anything in existence that can make an HTTP GET request. 
Pardon my ignorance, but, why do we need off-band software when we have 
something that works remarkably well?


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Call for Linked Research

2014-07-29 Thread Sarven Capadisli

On 2014-07-29 11:56, Hugh Glaser wrote:

This is of course an excellent initiative.
But I worry that it feels like people are talking about building stuff from 
scratch, or even lashing things together.

Is it really the case that a typical research approach to what you are calling 
Linked Research doesn’t turn up theories and systems that can inform what we do?

What I think you are talking about is what I think is commonly called e-Science.
And there is a vast body of research on this topic.
This initiative also impinges on the Open Archives/Access/Repositories 
movements, who are deeply concerned about how to capture all research outputs. 
See for example http://www.openarchives.org/ore/

In e-Science I know of http://www.myexperiment.org, for example, which has been 
doing what I think is very related stuff for 6 or 7 years now, with significant 
funding, so is a mature system.
And, of course, it is compatible with all our Linked Data goodness (I hope).
Eg http://www.myexperiment.org/workflows/59
We could do worse than look to see what they can do for us?
And it appears that things can be skinned within the system: 
http://www.myexperiment.org/packs/106

You are of course right, that it is a social problem, rather than a technical 
problem; this is why others’ experience in solving the social problem is of 
great interest.

Maybe myExperiment or a related system would do what you want pretty much out 
of the box?

Note that it goes even further than you are suggesting, as it has facilities to 
allow other researchers to actually run the code/workflows.

It would take us years to get anywhere close to this sort of thing, unless we 
(LD people) could find serious resources.
And I suspect we would end up with something that looks very similar!

Very best
Hugh


Thanks Hugh. Those are great examples and all the power to those people 
that's working hard at it. And you are right about the eScience bit. 
Just to clarify for anyone that's following this thread up:


It is not my intention to overlook or devalue existing or similar 
efforts to what I'm proposing. Nor is it my intention to re-brand 
anything. This is simply a Call to DIY.


If conferences and publishers set the limitations to how we can join our 
combine knowledge and efforts, that's a clear sign to take the control 
back. They are not delivering on anything. We can do better.


You publish your work in however LD-friendly way you can. How much 
effort that goes into it is what you and others can get back. If you are 
content to not be able to discover interesting or relevant parts of 
others people's knowledge using the technologies and tools that's in 
front of you, there is nothing to debate about here.


Like I said, it is mind-boggling to think that the SW/LD research 
community is stuck on 1-star Linked Data. Is that sinking in yet?


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Call for Linked Research

2014-07-28 Thread Sarven Capadisli

On 2014-07-28 16:16, Paul Houle wrote:

I'd add to all of this publishing the raw data,  source code,  and
industrialized procedures so that results are truly reproducible,  as
few results in science actually are.



On Mon, Jul 28, 2014 at 9:01 AM, Sarven Capadisli i...@csarven.ca wrote:

2. Publish your progress and work following the Linked Data design
principles. Create a URI for everything that is of some value to you and may
be to others e.g., hypothesis, workflow steps, variables, provenance,
results etc.



Agreed, but I think point 2 covers that. It was not my intention to give 
a complete coverage of the scientific method. Covering reproducibility 
is a given. It also goes for making sure that all of the publicly funded 
research material is accessible and free. And, one should not have to go 
through a 3rd party service (gatekeepers) to get a hold of someone 
else's knowledge.


If we can not have open and free access to someone else's research, or 
reproduce (within reasonable amount of effort), IMO, that research 
*does not exist*. That may not be a popular opinion out there, but I 
fail to see how such inaccessible work would qualify as scientific. 
Having to create an account on a publisher's site, and pay for the 
material, is not what I consider accessible. Whether that payment is 
withdrawn directly from my account or indirectly from the institution 
I'm with (which still comes out of my pocket).


Any way, this is discussed in great detail elsewhere by a lot of smart 
folks. Like I said, I had different intentions in my proposal i.e., DIY. 
Control your own publishing on the Web. If you must, hand out a copy 
e.g., PDF, to fulfil your h-index high-score.


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Call for Linked Research

2014-07-28 Thread Sarven Capadisli

On 2014-07-28 18:36, Bernard Vatant wrote:

Hi Sarven

On point 2 : Publish your progress and work following the Linked Data
design principles. Create a URI for everything that is of some value to
you and may be to others e.g., hypothesis, workflow steps, variables,
provenance, results etc.

For such public to be really interoperable, all this should rely on
shared vocabularies. This important point is not obvious in your call.
Which vocabularies would you suggest?
Semanticscience Integrated Ontology is a god candidate for this
http://lov.okfn.org/dataset/lov/details/vocabulary_sio.html
https://code.google.com/p/semanticscience/wiki/SIO


Certainly SIO it is a great candidate. But, again, it was not my 
intention to declare what should be used, whether that has to do with 
Web Science / Semantic Web / Linked Data or not. People should use 
whatever is most suitable to represent and refer to their work. They can 
make that judgement for their work better than others.


Having said that, I use and would like to suggest that people consider 
using the following:


* Semanticscience Integrated Ontology: http://semanticscience.org/
* Semantic Publishing and Referencing: http://purl.org/spar
* Provenance Ontology: http://www.w3.org/TR/prov-o
* Open Provenance Model for Workflows: http://www.opmw.org/ontology
* DC Terms: http://purl.org/dc/terms

If there are other great ones out there that's suitable to capture 
research material, please reply here. I've excluded some common ones 
like FOAF, VoID etc above.


-Sarven
htpp://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Call for Linked Research

2014-07-28 Thread Sarven Capadisli

On 2014-07-29 00:45, Andrea Splendiani wrote:

while I agree with you all, I was thinking: is the lack of reproducibility an 
issue due to the way results are represented ?
Apart for some fields (e.g.: bioinformatics), materials, samples, experience 
are probably more relevant and much harder to reproduce.


I think that depends on who we ask and how much they care about 
reproducibility.


*IMHO*, the SW/LD research scene is not exactly hard-science. It leans 
more on engineering and development than following the pure scientific 
method. Majority of the research that's coming out of this area focuses 
on showing positive and useful results, and that appears to materialize 
in some ways like:


* My code can beat up your code.
* We have something that is ground breaking.
* We have some positive results, and came up with a research problem.

How often do you come across negative results in the proceedings i.e., 
some *exploration* which ended up at a dead end?


It is trivial to find the evaluation section of a paper often replaced 
with benchmarks. Kjetil, pointed at this issue eloquently at ISWC 2013: 
http://folk.uio.no/kjekje/2013/iswc.pdf . Emphasizing on the need to do 
careful design of experiments where required.


In other cases, one practically needs to run after the authors 1) to get 
a copy of the original paper, 2) the tooling or whatever they built or 
3) the data that they used or produced. It is generally assumed that if 
some text is in a PDF, and gets a go ahead from a few reviewers, it 
passes as science. Paper? Code? Data? Environment? Send me an email please.


I am generalizing the situation of course. So, please put your 
pitchforks down. There is a lot of great work, and solid science 
conducted by the SW/LD community. But lets not keep our eyes off 
Signal:Noise.


So, yes, making efforts toward reproducibility is important to redeem 
ourselves. If you think that reproducibility in some other fields is 
more relevant and harder, well, then, I think we should be able to 
manage things on our end, don't you think?


The benefit of having the foundations for reproducibility via LD is 
that, we make it possible to query our research process and output, and 
introduce the possibility to compare atomic parts of the experiments, or 
even detect and fix issues.


If we can't handle the technicality that goes into creating linked 
research, how can we expect the rest of world to get on board? And we 
are not dealing with a technical problem here. It is blind obedience and 
laziness. There is absolutely nothing stopping us from playing along 
with the archaic industry models and publishing methods temporarily (for 
a number of good and valid reasons), if and only if, we first take care 
of ourselves and have complete control over things. Publish on your end, 
pass a stupid fixed copy to the conference/publisher. Then see how 
quickly the research paper landscape changes.


As I've stated at the beginning, it all depends on who we ask and how 
much they care. Do we? If so, what are we going to do about it?


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Updated LOD Cloud Diagram - Please enter your linked datasets into the datahub.io catalog for inclusion.

2014-07-25 Thread Sarven Capadisli

On 2014-07-24 14:18, Christian Bizer wrote:

Hi all,

Max Schmachtenberg, Heiko Paulheim and I have crawled of the Web of
Linked Data and have drawn an updated LOD Cloud diagram based on the
results of the crawl.

This diagram showing all linked datasets that our crawler managed to
discover in April 2014 is found here:

http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/LODCloudDiagram.png

We also analyzed the compliance of the different datasets with the
Linked Data best practices and a paper presenting the results of the
analysis is found below. The paper will appear at ISWC 2014 in the
Replication, Benchmark, Data and Software Track.

http://dws.informatik.uni-mannheim.de/fileadmin/lehrstuehle/ki/pub/SchmachtenbergBizerPaulheim-AdoptionOfLinkedDataBestPractices.pdf

The raw data used for our analysis is found on this page:

http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/

Our crawler did discover 77 dataset that do not allow crawling via their
robots.txt files and these datasets were not included into our analysis
and are also not included in the current version of the LOD Cloud diagram.

A list of these datasets is found at
http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/tables/notCrawlableDatasets.tsv

In order to give a comprehensive overview of all Linked Data sets that
are currently online, we would like to draw another version of the LOD
Cloud diagram including the datasets that our crawler has missed as well
as the datasets that do not allow crawling.

Thus, if you publish or know about linked datasets that are not in the
diagram or in the list of not crawlable datasets yet, please:

1.Enter them into the datahub.io data catalog until August 8^th .

2.Tag them in the catalog with the tag ‘lod’
(http://datahub.io/dataset?tags=lod)

3.Send an email to Max and Chris pointing us at the entry in the catalog.

We will include all datasets into the updated version of the cloud
diagram, that fulfill the following requirements:

1.Data items are accessible via dereferencable URIs.

2.The dataset sets at least 50 RDF links pointing at other datasets or
at least one other dataset is setting 50 RDF links pointing at your dataset.

Instructions on how to describe your dataset in the catalog are found here:

https://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation

Please make sure that you include information about the RDF links
pointing from your dataset into other datasets (field links: ) as well
as a tag indicating the topical category of your dataset, so that we
know how to include it into the diagram.

Please also include an example URI from your dataset into the catalog.

We will start to review the new datasets and to draw the updated version
of the LOD cloud diagram after August 8^th .

So please point us at datasets to be included before this date.

Cheers,

Max, Heiko, and Chris

--

Prof. Dr. Christian Bizer

Data and Web Science Research Group

Universität Mannheim, Germany
ch...@informatik.uni-mannheim.de

www.bizer.de



Thank you Chris, Max, Heiko, Andreas, Tobias, et al.,

I find that a diagram based on the crawlable LOD better captures the 
essence of LOD than the alternative methods e.g., curation based on 
catalog groups.


What this diagram reveals is that, the LOD landscape is very dynamic and 
dare I say, not so pretty looking. I suspect that people are going to 
update their presentations and articles from what comes out of this 
effort. And since the LOD Cloud diagrams of the past helped tremendously 
to put a face on the L(O)D effort, I humbly suggest that the next 
version of this diagram should give a bit more attention to its 
presentation:


* The visualisation should speak for itself: This is LOD crawled. The 
problem with this diagram is exactly the shortcoming of that. It is 
sticking to the rules of the previous diagrams, meanwhile trying to 
communicate something completely different.


* Consider: are node sizes relevant? arc density? which domains be 
captured? must all nodes be labeled (cut off point)? should the clusters 
be based on their linkage as opposed to their domain?


* In SVG and legible for a ~640px width in portrait (people would want 
to put it on (nearly fixed) views: slides, papers) - as a user of the 
LOD diagram, I don't want people to squint at the magnificence of LOD 
and not see anything more than dbpedia and bunch of circles with 
pastel colours.


* Use more of the available rectangular space. It doesn't have to be a 
perfectly shaped ellipse.


In summary: the visualisation should be created from complete scratch.


Now, having said all that, can I have my dataspaces in cornflower blue?

-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread Sarven Capadisli

On 2014-07-25 20:14, aho...@dcc.uchile.cl wrote:

Finally I wanted to raise one other troubling observation re: the LOD
Cloud, which was that *only one new dataset was added to the LOD Cloud
group in datahub over a period of twelve months* [2,3]. Jerven just added
the first dataset in 8 months, presumably due to this ongoing discussion.
One can scroll back in time arbitrarily far in the activity log of the
group to see precisely how much activity there has (not) been [3] (e.g.,
Ctrl+F for created). It's not comfortable reading but I think that we,
as a community, should seriously ask ourselves: why there has been so
little new activity?

[1] Tobias Käfer, Jürgen Umbrich, Aidan Hogan, Axel Polleres. Towards a
Dynamic Linked Data Observatory. LDOW 2012.
  - http://aidanhogan.com/docs/dyldo_ldow12.pdf

[2] Aidan Hogan, Claudio Gutierrez Paths towards the Sustainable
Consumption of Semantic Data on the Web. In the Proceedings of the
Alberto Mendelzon Workshop (AMW), Cartagena, Columbia, 4–6 June, 2014.
  - http://aidanhogan.com/docs/amw_2014.pdf

[3] http://datahub.io/group/activity/lodcloud/0


Given the state of CKAN, or at least the version that's running 
datahub.io now, looking solely at the lodcloud group [4], may be 
misleading. I can't recall exact dates now, but, there has been a number 
of important changes in the past which made it difficult to pick-up on 
the lodcloud-worthy datasets. For instance, when organizations [5] were 
introduced, datasets dangled around in the global pool. To reclaim 
them, you had to create an organization and move the datasets over, with 
permission from the datahub admins. AFAIK (things may have changed) you 
can't have a dataset that belongs to an organization and have it also 
assigned to a group. I suspect that, many (myself included), would 
prefer to manage the organization instead of handing it over to group 
admins. So, this may partially be the reason why the lodcloud group is 
not as active.


[4] http://datahub.io/group/lodcloud
[5] http://datahub.io/organization

-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Updated LOD Cloud Diagram - Missed data sources.

2014-07-25 Thread Sarven Capadisli

On 2014-07-25 20:46, Sarven Capadisli wrote:

On 2014-07-25 20:14, aho...@dcc.uchile.cl wrote:

Finally I wanted to raise one other troubling observation re: the LOD
Cloud, which was that *only one new dataset was added to the LOD Cloud
group in datahub over a period of twelve months* [2,3]. Jerven just added
the first dataset in 8 months, presumably due to this ongoing discussion.
One can scroll back in time arbitrarily far in the activity log of the
group to see precisely how much activity there has (not) been [3] (e.g.,
Ctrl+F for created). It's not comfortable reading but I think that we,
as a community, should seriously ask ourselves: why there has been so
little new activity?

[1] Tobias Käfer, Jürgen Umbrich, Aidan Hogan, Axel Polleres. Towards a
Dynamic Linked Data Observatory. LDOW 2012.
  - http://aidanhogan.com/docs/dyldo_ldow12.pdf

[2] Aidan Hogan, Claudio Gutierrez Paths towards the Sustainable
Consumption of Semantic Data on the Web. In the Proceedings of the
Alberto Mendelzon Workshop (AMW), Cartagena, Columbia, 4–6 June, 2014.
  - http://aidanhogan.com/docs/amw_2014.pdf

[3] http://datahub.io/group/activity/lodcloud/0


Given the state of CKAN, or at least the version that's running
datahub.io now, looking solely at the lodcloud group [4], may be
misleading. I can't recall exact dates now, but, there has been a number
of important changes in the past which made it difficult to pick-up on
the lodcloud-worthy datasets. For instance, when organizations [5] were
introduced, datasets dangled around in the global pool. To reclaim
them, you had to create an organization and move the datasets over, with
permission from the datahub admins. AFAIK (things may have changed) you
can't have a dataset that belongs to an organization and have it also
assigned to a group. I suspect that, many (myself included), would
prefer to manage the organization instead of handing it over to group
admins. So, this may partially be the reason why the lodcloud group is
not as active.

[4] http://datahub.io/group/lodcloud
[5] http://datahub.io/organization


Correction: I was thinking of the lodcloud organization [6]. At one 
point (may still be the case), you couldn't have a dataset belonging to 
two different organizations. So, it meant that either the organization 
of the dataset owner managed it, or handed it over to [6].


I'm sure Ross Jones can clarify.

[6] http://datahub.io/organization/lodcloud

-Sarven



Re: Updated LOD Cloud Diagram - Please enter your linked datasets into the datahub.io catalog for inclusion.

2014-07-24 Thread Sarven Capadisli

On 2014-07-24 15:16, KANZAKI Masahide wrote:

One quick question: why almost all nodes in social web are labeled as
StatusNet ?


I'm not at all surprised by this.

How many social networking services or software can you think of makes 
their data available in RDF?


The StatusNet [1] [2] was one such software. Flagship site identi.ca [3] 
[4] - Nowadays it is powered differently [5] (i.e., no FOAF AFAIK).


It made a pretty good *dent*, don't you think? ;)

[1] http://status.net/
[2] http://en.wikipedia.org/wiki/StatusNet
[3] http://identi.ca/
[4] http://en.wikipedia.org/wiki/Identi.ca
[5] https://github.com/e14n/pump.io

-Sarven




smime.p7s
Description: S/MIME Cryptographic Signature


Making sense of Linked Research

2014-07-22 Thread Sarven Capadisli

http://csarven.ca/sense-of-lsd-analysis

Okay, so, that's boring supposedly science-magic stuff.

Go ahead and dereference the URI to RDF.

Once again, IMHO, what's cooler is that it is a human and 
machine-friendly document. This is where Linked Research (aka: Linked 
Science, Semantic Publishing etc.) comes in:


The document is in XHTML+RDFa and has screen and print stylesheets. The 
screen styles are what you would normally see in your Web browser. The 
print style is based on the LNCS template (you know, the one that some 
SW/LD research events force you to use when you submit your SW/LD 
bling-bling in PDF) - so, yes, you can output to PDF. Go ahead and copy 
the stylesheets and make it better: 
http://github.com/csarven/linked-research


See how some of the following vocabularies/ontologies are put to use:

* Semantic Publishing and Referencing: http://purl.org/spar
* Provenance Ontology: http://www.w3.org/TR/prov-o
* Open Provenance Model for Workflows: http://www.opmw.org/ontology
* DC Terms: http://purl.org/dc/terms

There is much room for improvement. No doubt.

The SW/LD research community produces incredible work. Yet, what super 
sucks is that the community can not get its act together to eat its own 
dogfood. The community is at best stuck on *1-star* 
http://www.w3.org/DesignIssues/LinkedData . Even workshops that are 
about Linked Science or Semantic Publishing etc. are going in 
circles. Mind-boggling.


The community is socially challenged to improve the state of SW 
research. It has a hard time learning from its own discoveries because 
it is stuck on desktop native document formats e.g., PDF, as opposed to 
taking it to the Web in its truest sense. It hacks around to attach or 
gather metadata about the research document instead of focusing on the 
valuable things inside those documents, which goes far beyond titles, 
abstracts, subjects, references.. The community simply can not 
intelligently mine previously published, *publicly funded* research. 
Reinvents. The community has to jump through hoops and fire to access a 
PDF document that resides in some publishers website. Whoever is in 
charge of the domain/path, they call the shots!



Here is the challenge and a call to all SW/LD researchers. If you think 
your work is interesting enough, even slightly, willing to put your neck 
out, and want to make an honest contribution towards what we are all 
*essentially* working on, please give this a try:


1. Create and publish your goods: Any resource of significance should 
be given a URI.  - 
http://www.w3.org/DesignIssues/Axioms.html#Universality2 . That goes 
beyond the document that you submit your work to conferences. It is 
from hypotheses, experiments, results, workflows, everything in those 
documents that deserve to be known and accessible. It is so that the 
next researcher can *honestly* take from where you left off or compare 
their work with yours. Don't worry, there is plenty of information that 
needs to be text-mined, but we can certainly improve the situation on 
what can be structured and eventually queried for. At least we have a 
way to look up those atomic resources or discover them.


2. Publish your work on your personal site, university, work, wherever. 
The point is that you should have control over it.


3. Link to other people's goods.

4. Have an open comment system policy. Get reviews, feedback, questions 
all in there that the community can engage in to improve the research 
further. It will feed itself.



That's it. I'm done.

-Sarven
http://csarven.ca/#i




smime.p7s
Description: S/MIME Cryptographic Signature


Re: Education

2014-07-12 Thread Sarven Capadisli

On 2014-07-12 13:02, Hugh Glaser wrote:

The other day I was asked if I would like to run a Java module for some Physics 
 Astronomy students.
I am so far from plain Java and that sort of thing now there was almost a 
cognitive dissonance.

But it did cause me to ponder on about what I would do for such a requirement, 
given a blank sheet.

For people whose discipline is not primarily technical, what would a syllabus 
look like around Linked Data as a focus, but also causing them to learn lots 
about how to just do stuff on computers?

How to use a Linked Data store service as schemaless storage:
bit of intro to triples as simply a primitive representation format;
scripting for data transformation into triples - Ruby, Python, PHP, awk or 
whatever;
scripting for http access for http put, delete to store;
simple store query for service access (over http get);
scripting for data post-processing, plus interaction with any data analytic 
tools;
scripting for presentation in html or through visualisation tools.

It would be interesting for scientists and, even more, social scientists, 
archeologists, etc (alongside their statistical package stuff or whatever).
I think it would be really exciting for them, and they would get a lot of 
skills on the way - and of course they would learn to access all this Open Data 
stuff, which is becoming so important.
I’m not sure they would go for it ;-)

Just some thoughts.
And does anyone knows of such modules, or even is teaching them?

Best
Hugh



Hi Hugh,

I teach a few introductory lectures on Linked Data, HTTP, URI, RDF, 
SPARQL as part of a Web and Internet Technologies course to students in 
Business IT at the Bern University of Applied Sciences. Majority of the 
students do not have a developer profile. Focus of the lessons is not 
about the inner technical details of these technologies, but via some 
practical work, what they can take away: understanding some publishing 
and consuming challenges for data on the Web, and potentially 
communicating problems and solutions to their colleagues with technical 
expertise in the future.


What I have observed:

* Before going any further, examples on the state of things and the 
potentials of what can be accomplished is vital. If they are not 
remotely excited, it sets the tone for the remainder of the lectures.


* At first they do not completely take the importance of HTTP/URI 
seriously. They've seen them, they know mentality. The exercises 
around that is about designing their own URI patterns for their 
site/profile, and repeating the importance of Cool URIs and what that 
entails over and over.


* Majority of the students understand the RDF data model and can express 
statements (either using human language or one of the formats). I 
usually bounce back and forth between drawing graphs on the board, and 
showing, dereferencing, browsing RDF resources, and pointing at people 
and objects in and outside of the room.


* As far as their comprehension for the formats i.e., how to write some 
statements that's mostly syntactically valid, Turtle/N-Triples lead the 
pack. RDF/XML and RDFa usually turn out to be a disaster. Most do not 
bother with JSON(-LD).


* Once they get the hang of Turtle, they do relatively well in SPARQL. 
I've noticed that it is via SPARQL examples, trials and errors, they 
really get the potential of Linked Data. Along the way, it appears to 
reassure them that RDF and friends are powerful and will come in handy.



IMHO:

Although I welcome them to use any format for exercises and whatnot, I 
encourage them to use Turtle or N-Triples. I tell them that learning 
Turtle is the best investment because they can use that knowledge 
towards SPARQL. However, Turtle comes with a few syntactical traps and 
declarations, that, I secretly wish that they use N-Triples instead to 
learn to create statements for the sake of simplicity. After all, 
N-Triples is as WYSIWYG as it gets!


With a blank slate:

In most cases: I have a strong bias towards *nix command-line toolbox 
and shell scripting over alternative programming languages. *Out of the 
box*, the shell environment is remarkable and indispensable. The 
documentation is baked in. Working in this environment leads to some 
design decisions as described in 
http://www.faqs.org/docs/artu/ch01s06.html . One can do everything from 
data processing, transformations, inspection, analysis to 
parallelization here. Besides, it is the perfect glue for everything else.



-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


*Deadline extension* for [CFP] Second International Workshop on Semantic Statistics (SemStats 2014)

2014-07-04 Thread Sarven Capadisli

On 2014-06-24 08:59, Sarven Capadisli wrote:

SemStats 2014 Call for Papers
=

Second International Workshop on Semantic Statistics (SemStats 2014)

Workshop website: http://semstats.org/
Event hashtags: #SemStats #ISWC2014

in conjunction with

ISWC 2014
The 13th International Semantic Web Conference
Riva del Garda - Trentino, Italy, October 19-23, 2014
http://iswc2014.semanticweb.org/


Workshop Summary


The goal of this workshop is to explore and strengthen the relationship
between the Semantic Web and statistical communities, to provide better
access to the data held by statistical offices. It will focus on ways in
which statisticians can use Semantic Web technologies and standards in
order to formalize, publish, document and link their data and metadata.
It follows the 1st Semantic Statistics workshop held at ISWC 2013
(SemStats 2013) http://www.datalift.org/en/event/semstats2013 that was a
big success attracting more than 50 participants all along the day.

The statistical community shows more and more interest in the Semantic
Web. In particular, initiatives have been launched to develop semantic
vocabularies representing statistical classifications and discovery
metadata. Tools are also being created by statistical organizations to
support the publication of dimensional data conforming to the Data Cube
W3C Recommendation. But statisticians see challenges in the Semantic
Web: how can data and concepts be linked in a statistically rigorous
fashion? How can we avoid fuzzy semantics leading to wrong analyses? How
can we preserve data confidentiality?

The workshop will also cover the question of how to apply statistical
methods or treatments to linked data, and how to develop new methods and
tools for this purpose. Except for visualisation techniques and tools,
this question is relatively unexplored, but the subject will obviously
grow in importance in the near future.


Motivation
==

There is a growing interest regarding linked data and the Semantic Web
in the statistical community. A large amount of statistical data from
international and national agencies has already been published on the
web of data, for example Census data from the U.S., Spain or France,
amongst others. In most cases, though, this publication is done by
people exterior to the statistical office (see also
http://datahub.io/dataset/istat-immigration, http://270a.info/ or
http://eurostat.linked-statistics.org/), which raises issues such as
long-term URI persistence, institutional commitment and data maintenance.

Statistical organisations are also interested in how Semantic Web might
make it simpler for analysts to use well described statistical data in
conjunction with other forms of data (eg geospatial information,
scientific data, big data from various sources) which is expressed
semantically. The ability to bring together diverse types of data in
this way  should enable new insights on multifaceted issues.

Statistical organizations also possess an important corpus of structural
metadata such as concept schemes, thesauri, code lists and
classifications. Some of those are already available as linked data,
generally in SKOS format (e.g. FAO's Agrovoc or UN's COFOG). Semantic
web standards useful for the statisticians have now arrived at maturity.
The best examples are the W3C Data Cube, DCAT and ADMS vocabularies. The
statistical community is also working on the definition of more
specialized vocabularies, especially under the umbrella of the DDI
Alliance. For example, XKOS extends SKOS for the representation of
statistical classifications, and Disco defines a vocabulary for data
documentation and discovery. The Visual Analytics Vocabulary is a first
step towards semantic descriptions for user interface components
developed to visualize Linked Statistical Data which can lead to
increased linked data consumption and accessibility. We are now at the
tipping point where the statistical and the Semantic Web communities
have to formally exchange in order to share experiences and tools and
think ahead regarding the upcoming challenges.

Statisticians have a long-going culture of data integrity, quality and
documentation. They have developed industrialized data production and
publication processes, and they care about data confidentiality and more
generally how data can be used.

The web of data will benefit in getting rich data published by
professional and trustworthy data providers. It is also important that
metadata maintained by statistical offices like concept schemes of
economic or societal terms, statistical classifications, well-known
codes, etc., are available as linked data, because they are of good
quality, well-maintained, and they constitute a corpus to which a lot of
other data can refer to.

It seems that after a period where the aim was to publish as many
triples as possible, the focus of the Semantic Web community is now
shifting to having a better quality of data

[CFP] Second International Workshop on Semantic Statistics (SemStats 2014)

2014-06-24 Thread Sarven Capadisli
 a better quality of data and metadata, more coherent 
vocabularies (see the LOV initiative), good and documented naming 
patterns, etc. This workshop aims to contribute in these longer term 
problems in order to have a significant impact.


The statistics community faces sometimes challenges when trying to adopt 
Semantic Web technologies, in particular:


* difficulty to create and publish linked data: this can be alleviated 
by providing methods, tools, lessons learned and best practices, by 
publicizing successful examples and by providing support.
* difficulty to see the purpose of publishing linked data: we must 
develop end-user tools leveraging statistical linked data, provide 
convincing examples of real use in applications or mashups, so that the 
end-user value of statistical linked data and metadata appears more clearly.
* difficulty to use external linked data in their daily activity: it is 
important to develop statistical methods and tools especially tailored 
for linked data, so that statisticians can get accustomed to using them 
and get convinced of their specific utility.


To conclude, statisticians know how misleading it can be to exploit 
semantic connections without carefully considering and weighing 
information about the quality of these connections, the validity of 
inferences, etc. A challenge for them is to determine, to ensure and to 
inform consumers about the quality of semantic connections which may be 
used to support analysis in some circumstances but not others. The 
workshop will enable participants to discuss these very important issues.



Topics
==

The workshop will address topics related to statistics and linked data. 
This includes but is not limited to:


How to publish linked statistics?

* What are the relevant vocabularies for the publication of statistical 
data?
* What are the relevant vocabularies for the publication of statistical 
metadata (code lists and classifications, descriptive metadata, 
provenance and quality information, etc.)?
* What are the existing tools? Can the usual statistical software 
packages (e.g. R, SAS, Stata) do the job?
* How do we include linked data production and publication in the data 
lifecycle?

* How do we establish, document and share best practices?

How to use linked data for statistics?

* Where and how can we find statistics data: data catalogues, dataset 
descriptions, data discovery?
* How do we assess data quality (collection methodology, traceability, 
etc.)?
* How can we perform data reconciliation, ontology matching and instance 
matching with statistical data?
* How can we apply statistical processes on linked data: data analysis, 
descriptive statistics, estimation, correction?
* How to intuitively represent statistical linked data: visual 
analytics, results of data mining?



Submissions
===

This workshop is aimed at an interdisciplinary audience of researchers 
and practitioners involved or interested in Statistics and the Semantic 
Web. All papers must represent original and unpublished work that is not 
currently under review. Papers will be evaluated according to their 
significance, originality, technical content, style, clarity, and 
relevance to the workshop. At least one author of each accepted paper is 
expected to attend the workshop.


Workshop participation is available to ISWC 2014 attendants at an 
additional cost, see http://iswc2014.semanticweb.org/registration for 
details.


The workshop will also feature a challenge based on Census Data 
published on the web or provided by Statistical Institutes. It is 
expected that data from Australia, France and Italy will be available. 
The challenge will consist in the realization of mashups or 
visualizations, but also on comparisons, alignment and enrichment of the 
data and concepts involved.


We welcome the following types of contributions:

* Full research papers (up to 12 pages)
* Short papers (up to 6 pages)
* Challenge papers (up to 6 pages)

All submissions must be written in English and must be formatted 
according to the information for LNCS Authors (see 
http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0).  Please, 
note that (X)HTML(+RDFa) submissions are also welcome as soon as the 
layout complies with the LNCS style. Authors can for example use the 
template provided at https://github.com/csarven/linked-research. 
Submissions are NOT anonymous. Please submit your contributions 
electronically in PDF format at 
http://www.easychair.org/conferences/?conf=semstats2014 and before July 
7, 2014, 23:59 PM Hawaii Time. All accepted papers will be archived in 
an electronic proceedings published by CEUR-WS.org.


See important dates and contact info on the workshop home page.

If you are interested in submitting a paper but would like more 
preliminary information, please contact semstats2...@easychair.org.



Chairs
==

* Sarven Capadisli, University of Leipzig, Germany, and Bern University 
of Applied Sciences, Switzerland

[CfP] Second International Workshop on Semantic Statistics (SemStats 2014)

2014-05-07 Thread Sarven Capadisli
 is now 
shifting to having a better quality of data and metadata, more coherent 
vocabularies (see the LOV initiative), good and documented naming 
patterns, etc. This workshop aims to contribute in these longer term 
problems in order to have a significant impact.


The statistics community faces sometimes challenges when trying to adopt 
Semantic Web technologies, in particular:


* difficulty to create and publish linked data: this can be alleviated 
by providing methods, tools, lessons learned and best practices, by 
publicizing successful examples and by providing support.
* difficulty to see the purpose of publishing linked data: we must 
develop end-user tools leveraging statistical linked data, provide 
convincing examples of real use in applications or mashups, so that the 
end-user value of statistical linked data and metadata appears more clearly.
* difficulty to use external linked data in their daily activity: it is 
important to develop statistical methods and tools especially tailored 
for linked data, so that statisticians can get accustomed to using them 
and get convinced of their specific utility.


To conclude, statisticians know how misleading it can be to exploit 
semantic connections without carefully considering and weighing 
information about the quality of these connections, the validity of 
inferences, etc. A challenge for them is to determine, to ensure and to 
inform consumers about the quality of semantic connections which may be 
used to support analysis in some circumstances but not others. The 
workshop will enable participants to discuss these very important issues.



Topics
==

The workshop will address topics related to statistics and linked data. 
This includes but is not limited to:


How to publish linked statistics?

* What are the relevant vocabularies for the publication of statistical 
data?
* What are the relevant vocabularies for the publication of statistical 
metadata (code lists and classifications, descriptive metadata, 
provenance and quality information, etc.)?
* What are the existing tools? Can the usual statistical software 
packages (e.g. R, SAS, Stata) do the job?
* How do we include linked data production and publication in the data 
lifecycle?

* How do we establish, document and share best practices?

How to use linked data for statistics?

* Where and how can we find statistics data: data catalogues, dataset 
descriptions, data discovery?
* How do we assess data quality (collection methodology, traceability, 
etc.)?
* How can we perform data reconciliation, ontology matching and instance 
matching with statistical data?
* How can we apply statistical processes on linked data: data analysis, 
descriptive statistics, estimation, correction?
* How to intuitively represent statistical linked data: visual 
analytics, results of data mining?



Submissions
===

This workshop is aimed at an interdisciplinary audience of researchers 
and practitioners involved or interested in Statistics and the Semantic 
Web. All papers must represent original and unpublished work that is not 
currently under review. Papers will be evaluated according to their 
significance, originality, technical content, style, clarity, and 
relevance to the workshop. At least one author of each accepted paper is 
expected to attend the workshop.


Workshop participation is available to ISWC 2014 attendants at an 
additional cost, see http://iswc2014.semanticweb.org/registration for 
details.


The workshop will also feature a challenge based on Census Data 
published on the web or provided by Statistical Institutes. It is 
expected that data from Australia, France and Italy will be available. 
The challenge will consist in the realization of mashups or 
visualizations, but also on comparisons, alignment and enrichment of the 
data and concepts involved.


We welcome the following types of contributions:

* Full research papers (up to 12 pages)
* Short papers (up to 6 pages)
* Challenge papers (up to 6 pages)

All submissions must be written in English and must be formatted 
according to the information for LNCS Authors (see 
http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0).  Please, 
note that (X)HTML(+RDFa) submissions are also welcome as soon as the 
layout complies with the LNCS style. Authors can for example use the 
template provided at https://github.com/csarven/linked-research. 
Submissions are NOT anonymous. Please submit your contributions 
electronically in PDF format at 
http://www.easychair.org/conferences/?conf=semstats2014 and before July 
7, 2014, 23:59 PM Hawaii Time. All accepted papers will be archived in 
an electronic proceedings published by CEUR-WS.org.


See important dates and contact info on the workshop home page.

If you are interested in submitting a paper but would like more 
preliminary information, please contact semstats2...@easychair.org.



Chairs
==

* Sarven Capadisli, University of Leipzig, Germany, and Bern University

Re: Generate LOD cloud diagram image

2014-05-04 Thread Sarven Capadisli

On 2014-04-22 00:39, Luca Matteis wrote:

I decided to spend the weekend and
come up with a pure JavaScript/CSS3 solution:
http://lmatteis.github.io/void-graph/


http://270a.info/ is now using void-graph.

-Sarven
http://csarven.ca/#i




smime.p7s
Description: S/MIME Cryptographic Signature


Re: LOD cloud diagram ?

2014-05-04 Thread Sarven Capadisli

On 2014-04-25 18:44, Kingsley Idehen wrote:


All,

Circa. 2014, we shouldn't be depending on any piece of platform specific
technology to create a graphical representation of datasets across the
entire LOD cloud, or specific sub segments of said cloud.

We should be able to generate graphical representations that include the
following fundamental features:

1. graph visualization
2. live hyperlinks of URIs that denote the cloud data sets.

Why? You end up with yet another beachhead for Linked Data
follow-your-nose pattern.

Whenever we produce output with non existent or blurry access to HTTP
URIs that denote entities (documents, agents, or other entity types)
with inadvertently missed a powerful opportunity to showcase the basic
Linked Open Data principles value proposition.

Also what applies to this specific scenario also applies to any other
resource published on the LOD banner.

Shortcut:

Find a visualization tool that can process CSV ouptput.
Feed it SPARQL URL for a solution where the output format is CSV.
Done.



I think what you are thinking of is more along the lines of 
http://lodlive.it/


Perhaps void-graph's aim to produce the LOD Cloud diagram is a bit 
ambitious at this time. I'm not sure if that itself is still a 
worthwhile exercise any way anymore. There are efforts like 
http://lod4all.net/ which goes in that direction, but the datasets that 
are listed feel curated.


However, what might be more interesting is creating a more realistic LOD 
cloud that's perhaps derived from the crawls like LDspider.


I'm not convinced on the shortcut idea you propose. Just before it you 
were talking about tapping on the power of true LOD, then you want to 
process the information via CSV.


IMO, lodlive and void-graph are both in the right direction. void-graph 
in and of itself is a useful tool, at least to create a simple diagram 
for the 270a.info dataspaces.


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: First Call for Papers: Linked Science 2014 (LISC2014 workshop @ISWC2014)

2014-04-30 Thread Sarven Capadisli

On 2014-04-30 09:38, Erp, M.G.J. van wrote:

Submissions should be formatted according to the Lecture Notes in Computer 
Science guidelines for proceedings available at 
http://www.springer.com/computer/lncs?SGWID=0-164-7-72376-0 and submitted to 
https://www.easychair.org/conferences/?conf=lisc2014. Papers should be 
submitted in PDF format.


Linked Science is brought to you by PDF.

-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Linked SDMX Data

2014-04-23 Thread Sarven Capadisli

On 2014-04-22 14:18, Sarven Capadisli wrote:

On 2013-08-08 15:17, Sarven Capadisli wrote:

On 03/08/2013 01:04 PM, Sarven Capadisli wrote:

On 02/15/2013 02:42 PM, Sarven Capadisli wrote:

Ahoy hoy,

OECD Linked Data:
http://oecd.270a.info/

BFS Linked Data:
http://bfs.270a.info/

FAO Linked Data:
http://fao.270a.info/

Linked SDMX Data:
http://csarven.ca/linked-sdmx-data


ECB Linked Data:
http://ecb.270a.info/


IMF Linked Data:
http://imf.270a.info/


UIS Linked Data:
http://uis.270a.info/


FRB Linked Data:
http://frb.270a.info/

-Sarven
http://csarven.ca/#i




smime.p7s
Description: S/MIME Cryptographic Signature


Re: Linked SDMX Data

2014-04-22 Thread Sarven Capadisli

On 2013-08-08 15:17, Sarven Capadisli wrote:

On 03/08/2013 01:04 PM, Sarven Capadisli wrote:

On 02/15/2013 02:42 PM, Sarven Capadisli wrote:

Ahoy hoy,

OECD Linked Data:
http://oecd.270a.info/

BFS Linked Data:
http://bfs.270a.info/

FAO Linked Data:
http://fao.270a.info/

Linked SDMX Data:
http://csarven.ca/linked-sdmx-data


ECB Linked Data:
http://ecb.270a.info/


IMF Linked Data:
http://imf.270a.info/


UIS Linked Data:
http://uis.270a.info/

-Sarven
http://csarven.ca/#i




smime.p7s
Description: S/MIME Cryptographic Signature


Re: How to publish SPARQL endpoint limits/metadata?

2013-10-08 Thread Sarven Capadisli

On 10/08/2013 11:46 AM, Frans Knibbe | Geodan wrote:

I am experimenting with running SPARQL endpoints and I notice the need
to impose some limits to prevent overloading/abuse. The easiest and I
believe fairly common way to do that is to LIMIT the number of results
that the endpoint will return for a single query.

I now wonder how I can publish the fact that my SPARQL endpoint has a
LIMIT and that is has a certain value.


Besides VoID and SD as others already mentioned, here is another take on 
this problem:


While I can see that making the feature or configuration set available 
and machine-friendly is a nice to have, I don't know of any tooling 
that's out there, is capable of factoring in this type of information. 
Not to mention whether the triple statements which pertain that 
information will be /easily/ identifiable by users.


I feel that something like this is may better announced in plain ol' 
documentation any way.


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: How to publish SPARQL endpoint limits/metadata?

2013-10-08 Thread Sarven Capadisli

On 10/08/2013 11:46 AM, Frans Knibbe | Geodan wrote:

I am experimenting with running SPARQL endpoints and I notice the need
to impose some limits to prevent overloading/abuse. The easiest and I
believe fairly common way to do that is to LIMIT the number of results
that the endpoint will return for a single query.

I now wonder how I can publish the fact that my SPARQL endpoint has a
LIMIT and that is has a certain value.


Besides VoID and SD as others already mentioned, here is another take on 
this problem:


While I can see that making the feature or configuration set available 
and machine-friendly is a nice to have, I don't know of any tooling 
that's out there, is capable of factoring in this type of information. 
Not to mention whether the triple statements which pertain that 
information will be /easily/ identifiable by users.


I feel that something like this may better announced in plain ol' 
documentation any way.


-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Minimizing data volume

2013-09-09 Thread Sarven Capadisli

On 09/09/2013 11:47 AM, Frans Knibbe | Geodan wrote:

Hello,

In my line of work (geographical information) I often deal with high
volume data. The high volume is caused by single facts having a big
size. A single 2D or 3D geometry is often encoded as a single text
string and can consist of thousands of numbers (coordinates). It is easy
to see that this can cause performance issues with transferring and
processing data. So I wonder about the state of the art in minimizing
data volume in Linked Data. I know that careful publication of data will
help a bit: multiple levels of detail could be published, coordinates
could use significant digits (they almost never do), but it seems to me
that some kind of compression is needed too. Is there something like a
common approach to data compression at the moment? Something that is
understood by both publishers and consumers of data?

Regards,
Frans


You might want to look into RDF HDT [1].

[1] http://www.rdfhdt.org/

-Sarven
http://csarven.ca/#i




smime.p7s
Description: S/MIME Cryptographic Signature


Re: HOWOTO make a WebID manually - was: WebID Frustration

2013-08-08 Thread Sarven Capadisli

On 08/07/2013 01:21 AM, Henry Story wrote:


On 7 Aug 2013, at 01:02, Sarven Capadisli i...@csarven.ca wrote:


On 08/06/2013 01:37 PM, Hugh Glaser wrote:

Well, RWW.IO looked exciting, so I decided to start with it.
And it seemed a good idea to have an account, so I decided I would finally 
create a WebID login - I know that lots of people think that this is the Way 
Ahead.
I have a foaf file (actually more than one), and trawling the web, it seems 
that I if I have a foaf file I can use it for WebID.
I certainly don't want to create it on some other site - I need another account 
like I need a hole in the head - in fact, that is what is meant to be good 
about WebID!
Surely it isn't Just one last new account.

Anyway, you can guess that a while later I still don't seem to have managed it.
I have read any number of pages that give me simple guides to doing stuff, 
with links to things that should help, etc. (often dead).
I confess that I was definitely looking for the easiest way - for example, 
downloading a program to run just doesn't seem the sort of thing I want to do 
for something that is meant to be simple.
Sorry if that all sounds provocative, but I am a bit frustrated!

So have I missed something here?
Is there really not a page that will really work for me?
I'm using Safari on a Mac, by the way.
And I'm trying to login in to https://hugh.rww.io

Best
Hugh


Just dropping this here for anyone that finds it useful.

The following will get you a public key that you can use in your WebID profile, 
a certificate that you can use to digitally sign your emails as well as to 
authenticate from your Web browser:

Create a public/private key as you would to SSH to networks:

$ ssh-keygen

Add your URI and email in openssl.cnf, then create the certificate using your 
private key from above. Import from your email client:

$ openssl req -x509 -new -config openssl.cnf -days 36500 -key id_rsa -out 
id_rsa.crt

Export to PKCS #12 and import from your browser:

$ openssl pkcs12 -export -in id_rsa.crt -inkey id_rsa -out id_rsa.p12

Copy/paste certificate signature value into your WebID profile.


Thanks Sarven. We should add the above to a HOWTO, and add the following 
perhaps.

Just add use this as a pattern:

@prefix cert: http://www.w3.org/ns/auth/cert# .

?webid cert:key [ cert:modulus ...^^xsd:hexBinary;
   cert:exponent 65537 ] . # replace number with actual value

as described in
   
https://dvcs.w3.org/hg/WebID/raw-file/tip/spec/tls-respec.html#publishing-the-certificate-data-in-a-webid-profile-document


Where would be a good place to put this? We used to have this:
http://www.w3.org/wiki/Foaf%2Bssl/HOWTO

But I think it would be good to move it to an official wiki spot on our 
Community wiki.

Henry



-Sarven
http://csarven.ca/#i



Social Web Architect
http://bblfish.net/


Henry, I've added it to the wiki for now before it gets forgotten. It 
can be relocated later.


http://www.w3.org/wiki/index.php?title=Foaf%2Bssl%2FHOWTOdiff=67728oldid=61017

-Sarven
http://csarven.ca/#i



smime.p7s
Description: S/MIME Cryptographic Signature


  1   2   >