Hi Debajyoti,
I am not quite sure, what your idea is and what you mean with the
"creation and generation of basic triples, chaining of triples etc in
python" . Didn't you use the extraction framework to get triples from
Wikipedia/Wiktionary? Actually, I am not sure, whether you are talking
about Wiktionary or Wikipedia.
All the best,
Sebastian
Am 17.04.2013 12:01, schrieb Debajyoti Datta:
Dear Dbpedia Developers,
After a couple of days of fiddling with creation and generation of
basic triples, chaining of triples etc in python, following are some
of the things I have finally managed to pen down.
This is related to giving users the ability to configure and extract
triples and having a very responsive functional GUI for the same.
A GUI can be the key here because a visual cue for feed forward
inference will be a great and an effective way for generating triples
and allowing users to modify existing triples . (Discussed in detail
later)
Firstly almost every user can generate some new triple which are
obvious choices. Like the deterministic ones: If I know the distance
between two places is 16 kms then the corresponding distance between
the two is 9.941 miles. Now having a GUI to do that can be efficient
by having similar mappings ( like kms to miles, dollars to other
currencies and so on...) Essentially from this, one can infer new
information from an already existing triple. And the GUI will give
users the option to create these new links or add a completely
different link again based on a certain dictionary of words or
pre-specified rules.
The GUI will have options based on (may be already used key words!) to
extract these triples for other feed forward inferences like
classifications ( like if Kolkata Mumbai and Chennai are port cities
then they are close to the seas and so on..)
Obviously what counts as "Information" and which rules are appropriate
will vary depending on the context but the idea is that by using rules
along with some knowledge and outside services we can generate new
assertions from our existing set of assertions and this part is
slightly easier to achieve when one has a web front-end (or rather a
visually pleasing web front end) that guides the user.
Not just applicable in this context but even for geocoding rules when
someone is googling a location of a place via the address the web
frontend can also return the lattitude, longitude and an image. This
will also be another way in which wikitionary users can create new
triples involving places.
Another approach of wikitionary users to create new triples would
again be some information that users will provide the best! Like
suppose affairs of famous personalities ( Like Britney broke up with
some famous actor harry to be with another famous actor potter ).
These sort of triple generation can also be very helpful and having a
frontend GUI to do that would only add to the existing knowledge base.
The GUI should provide a way so that users create these triples in a
logical way without duplication based on certain rules. Like if
someone wants to add that famous individual X and famous individual Y
are "dating", then instead of dating having other words like
"going_out" , "affair", "girlfriend/boyfriend" etc also should not
result in duplication of triples or redundancy in triples. (I wish I
thought of a better example!)
Also these logs of user actions should all then be stored for further
creation of new triples that can be done through machine learning
algorithms in multiple ways. Like having a new restaurant in an area
popular with the Chinese population and culture will likely be a
Chinese restaurant and these can be achieved with the use of popular
machine learning algorithms like k-means but the problem here is with
so much data and so many iterations... ( I guess coming up with
optimal algorithms will be difficult and/or resource intensive. But
will be something challenging to try out! ).
The technical stuff: Have to still figure out the correct approach to
processing and displaying RDF data. A lot of that has already been
done based on XML technologies. Highly sophisticated framework like
Cocoon provide means for complex XML based output generation
tasks. Apparently processing RDF data on the XML level with XML tools
is possible when one
preprocesses the RDF and derives a canonical serialization of the RDF.
Actually a lot more needs to be figured out in this aspect and will
wait for your feedback before I go ahead.
About me:
I am a student of information systems with a lot of programming
experience. Fell in love with MOOCs and thus the following Human
Computer Interaction
<http://www.google.com/url?q=https%3A%2F%2Fdocs.google.com%2Fopen%3Fid%3D0BwMKr-KwTT8KWU44OHQ0WGVKX2s>,
Machine Learning
<http://www.google.com/url?q=https%3A%2F%2Fdocs.google.com%2Fopen%3Fid%3D0BwMKr-KwTT8KWTUxeU9BWkZUaXc> and
Social Network Analysis
<http://www.google.com/url?q=https%3A%2F%2Fdocs.google.com%2Fopen%3Fid%3D0BwMKr-KwTT8KNjdQb1RNQXk1cEE> happened!
I came across a brilliant initiative by OPEN HPI, the "Semantic Web"
course and will probably go into semantic web research in the future.
I would love to have feedback on the above ideas and the feasibility
of them. It is true that my experience with semantic web is limited
but have a lot of coding experience in machine learning, python and
have other open source contributions. (Like designer and developer of
open-advice.org <http://open-advice.org> under Lydia Pintscher of
KDE.) Winner of Google Developer Group hackathon for the best business
app and some more similar but not so interesting stuff....
Last summer I was a Google Summer of Code intern at Connexions, and
here
<http://blog.cnx.org/2012/08/google-summer-of-code-2012-comes-to.html>
is a post from my mentoring organization about my work.
Looking forward to all your feedback.
Best Regards,
Debajyoti Datta
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc