Dear Dbpedia Developers,

After a couple of days of fiddling with creation and generation of basic
triples, chaining of triples etc in python, following are some of the
things I have finally managed to pen down.

This is related to giving users the ability to configure and extract
triples and having a very responsive functional GUI for the same.

A GUI can be the key here because a visual cue for feed forward inference
will be a great and an effective way for generating triples and allowing
users to modify existing triples . (Discussed in detail later)

Firstly almost every user can generate some new triple which are obvious
choices. Like the deterministic ones: If I know the distance between two
places is 16 kms then the corresponding distance between the two is 9.941
miles. Now having a GUI to do that can be efficient by having similar
mappings ( like kms to miles, dollars to other currencies and so on...)
Essentially from this, one can infer new information from an already
existing triple. And the GUI will give users the option to create these new
links or add a completely different link again based on a certain
dictionary of words or pre-specified rules.

The GUI will have options based on (may be already used key words!) to
extract these triples for other feed forward inferences like
classifications ( like if Kolkata Mumbai and Chennai are port cities then
they are close to the seas and so on..)

Obviously what counts as "Information" and which rules are appropriate will
vary depending on the context but the idea is that by using rules along
with some knowledge and outside services we can generate new assertions
from our existing set of assertions and this part is slightly easier to
achieve when one has a web front-end (or rather a visually pleasing web
front end) that guides the user.

Not just applicable in this context but even for geocoding rules when
someone is googling a location of a place via the address the web frontend
can also return the lattitude, longitude and an image. This will also be
another way in which wikitionary users can create new triples involving
places.

Another approach of wikitionary users to create new triples would again be
some information that users will provide the best! Like suppose affairs of
famous personalities ( Like Britney broke up with some famous actor harry
to be with another famous actor potter ). These sort of triple generation
can also be very helpful and having a frontend GUI to do that would only
add to the existing knowledge base.

The GUI should provide a way so that users create these triples in a
logical way without duplication based on certain rules. Like if someone
wants to add that famous individual X and famous individual Y are "dating",
then instead of dating having other words like "going_out" , "affair",
"girlfriend/boyfriend" etc also should not result in duplication of triples
or redundancy in triples. (I wish I thought of a better example!)

Also these logs of user actions should all then be stored for further
creation of new triples that can be done through machine learning
algorithms in multiple ways. Like having a new restaurant in an area
popular with the Chinese population and culture will likely be a Chinese
restaurant and these can be achieved with the use of popular machine
learning algorithms like k-means but the problem here is with so much data
and so many iterations... ( I guess coming up with optimal algorithms will
be difficult and/or resource intensive. But will be something challenging
to try out! ).

The technical stuff: Have to still figure out the correct approach to
processing and displaying RDF data. A lot of that has already been done
based on XML technologies.  Highly sophisticated framework like Cocoon
provide means for complex XML based output generation
tasks. Apparently processing RDF data on the XML level with XML tools is
possible when one
preprocesses the RDF and derives a canonical serialization of the RDF.
Actually a lot more needs to be figured out in this aspect and will wait
for your feedback before I go ahead.

About me:

I am a student of information systems with a lot of programming experience.
Fell in love with MOOCs and thus the following Human Computer
Interaction<http://www.google.com/url?q=https%3A%2F%2Fdocs.google.com%2Fopen%3Fid%3D0BwMKr-KwTT8KWU44OHQ0WGVKX2s>
, Machine 
Learning<http://www.google.com/url?q=https%3A%2F%2Fdocs.google.com%2Fopen%3Fid%3D0BwMKr-KwTT8KWTUxeU9BWkZUaXc>
 and Social Network
Analysis<http://www.google.com/url?q=https%3A%2F%2Fdocs.google.com%2Fopen%3Fid%3D0BwMKr-KwTT8KNjdQb1RNQXk1cEE>
happened!
I came across a brilliant initiative by OPEN HPI, the "Semantic Web" course
and will probably go into semantic web research in the future. I would love
to have feedback on the above ideas and the feasibility of them. It is true
that my experience with semantic web is limited but have a lot of coding
experience in machine learning, python and have other open source
contributions. (Like designer and developer of open-advice.org under Lydia
Pintscher of KDE.) Winner of Google Developer Group hackathon for the best
business app and some more similar but not so interesting stuff....

Last summer I was a Google Summer of Code intern at Connexions, and
here<http://blog.cnx.org/2012/08/google-summer-of-code-2012-comes-to.html>is
a post from my mentoring organization about my work.

Looking forward to all your feedback.

Best Regards,
Debajyoti Datta
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to