Hi  Debajyoti,

I am not quite sure, what your idea is and what you mean with the "creation and generation of basic triples, chaining of triples etc in python" . Didn't you use the extraction framework to get triples from Wikipedia/Wiktionary? Actually, I am not sure, whether you are talking about Wiktionary or Wikipedia.
All the best,
Sebastian




Am 17.04.2013 12:01, schrieb Debajyoti Datta:
Dear Dbpedia Developers,

After a couple of days of fiddling with creation and generation of basic triples, chaining of triples etc in python, following are some of the things I have finally managed to pen down.

This is related to giving users the ability to configure and extract triples and having a very responsive functional GUI for the same.

A GUI can be the key here because a visual cue for feed forward inference will be a great and an effective way for generating triples and allowing users to modify existing triples . (Discussed in detail later)

Firstly almost every user can generate some new triple which are obvious choices. Like the deterministic ones: If I know the distance between two places is 16 kms then the corresponding distance between the two is 9.941 miles. Now having a GUI to do that can be efficient by having similar mappings ( like kms to miles, dollars to other currencies and so on...) Essentially from this, one can infer new information from an already existing triple. And the GUI will give users the option to create these new links or add a completely different link again based on a certain dictionary of words or pre-specified rules.

The GUI will have options based on (may be already used key words!) to extract these triples for other feed forward inferences like classifications ( like if Kolkata Mumbai and Chennai are port cities then they are close to the seas and so on..)

Obviously what counts as "Information" and which rules are appropriate will vary depending on the context but the idea is that by using rules along with some knowledge and outside services we can generate new assertions from our existing set of assertions and this part is slightly easier to achieve when one has a web front-end (or rather a visually pleasing web front end) that guides the user.

Not just applicable in this context but even for geocoding rules when someone is googling a location of a place via the address the web frontend can also return the lattitude, longitude and an image. This will also be another way in which wikitionary users can create new triples involving places.

Another approach of wikitionary users to create new triples would again be some information that users will provide the best! Like suppose affairs of famous personalities ( Like Britney broke up with some famous actor harry to be with another famous actor potter ). These sort of triple generation can also be very helpful and having a frontend GUI to do that would only add to the existing knowledge base.

The GUI should provide a way so that users create these triples in a logical way without duplication based on certain rules. Like if someone wants to add that famous individual X and famous individual Y are "dating", then instead of dating having other words like "going_out" , "affair", "girlfriend/boyfriend" etc also should not result in duplication of triples or redundancy in triples. (I wish I thought of a better example!)

Also these logs of user actions should all then be stored for further creation of new triples that can be done through machine learning algorithms in multiple ways. Like having a new restaurant in an area popular with the Chinese population and culture will likely be a Chinese restaurant and these can be achieved with the use of popular machine learning algorithms like k-means but the problem here is with so much data and so many iterations... ( I guess coming up with optimal algorithms will be difficult and/or resource intensive. But will be something challenging to try out! ).

The technical stuff: Have to still figure out the correct approach to processing and displaying RDF data. A lot of that has already been done based on XML technologies. Highly sophisticated framework like Cocoon provide means for complex XML based output generation tasks. Apparently processing RDF data on the XML level with XML tools is possible when one preprocesses the RDF and derives a canonical serialization of the RDF. Actually a lot more needs to be figured out in this aspect and will wait for your feedback before I go ahead.

About me:

I am a student of information systems with a lot of programming experience. Fell in love with MOOCs and thus the following Human Computer Interaction <http://www.google.com/url?q=https%3A%2F%2Fdocs.google.com%2Fopen%3Fid%3D0BwMKr-KwTT8KWU44OHQ0WGVKX2s>, Machine Learning <http://www.google.com/url?q=https%3A%2F%2Fdocs.google.com%2Fopen%3Fid%3D0BwMKr-KwTT8KWTUxeU9BWkZUaXc> and Social Network Analysis <http://www.google.com/url?q=https%3A%2F%2Fdocs.google.com%2Fopen%3Fid%3D0BwMKr-KwTT8KNjdQb1RNQXk1cEE> happened! I came across a brilliant initiative by OPEN HPI, the "Semantic Web" course and will probably go into semantic web research in the future. I would love to have feedback on the above ideas and the feasibility of them. It is true that my experience with semantic web is limited but have a lot of coding experience in machine learning, python and have other open source contributions. (Like designer and developer of open-advice.org <http://open-advice.org> under Lydia Pintscher of KDE.) Winner of Google Developer Group hackathon for the best business app and some more similar but not so interesting stuff....

Last summer I was a Google Summer of Code intern at Connexions, and here <http://blog.cnx.org/2012/08/google-summer-of-code-2012-comes-to.html> is a post from my mentoring organization about my work.

Looking forward to all your feedback.

Best Regards,
Debajyoti Datta


------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter


_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc


--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://linguistics.okfn.org , http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to