Re: [nepomuk-kde] natural language processing + nepomuk

Jordi Polo Tue, 28 Oct 2008 08:38:55 -0700

On Tue, Oct 28, 2008 at 9:00 PM, Sebastian Trüg <[EMAIL PROTECTED]> wrote:


> On Tuesday 28 October 2008 06:29:19 Jordi Polo wrote:
> > I thought the desktop search problem was already solved.
>
> well, not really. What we can do is map fields and values to rdf properties
> and resources. In other words a query "foo:bar" will be translated by
> matching "foo" to property labels and then depending on the range of the
> property "bar" will either be matched to resource labels of the type or to
> literal strings.
> But that still means that users need to know the field names. A typical
> problems is:
>
> * I want to get all emails from A. Imagine the property is named "sender".
> Now
>  the user needs to do a "sender:A" query. "from:A" would not work.
>

So that's all the current libnepomukquery library functionality?
The nepomuk search runner has been added to kdebase trunk just today IIRC.
Improving libnepomukquery will improve it also. A good idea to start coding
something something I guess.


One possibility is to use systems such as wordnet to enrich the labels of
> properties.


Only for English or other few languages with free of charge wordnets.





> AFAIK something like that has already been tried for Beagle++ by
> L3S. Could be interesting to look at this for KDE.
> But even cooler would it be to allow natural language queries and match
> these
> to fields and values. I am really no expert on the topic, on the contrary.
> Thus, I am just throwing around some ideas.
>
> This realates to your idea with runner actions I think.
>

I see two user cases:

User who wants something with as less keystrokes as possible --> runners and
krunner
User who wants to use natural language --> The system I'm thinking about



>
> > About analysis for keyword extraction or concept extraction. I am a new
> > born, so to speak. in ontology based technologies and surely the people
> on
> > this list much more involved in semantic web technologies can further
> > comment on this. But I would say that there are already open source
> > inference engines and I guess that there are already known solutions for
> > more complex processing.
> > I have also seen some papers on "text to ontology" solutions. IIRC there
> > are some research on keyword extraction with no corpus. Given that we aim
> > to multilinguism, we need this kind of technique. I will take a look to
> the
> > results they got.
> > Also, about concept extraction, I think some people are using FCA (Formal
> > concept analysis), which can be a good idea (but again, I let experts
> > speak).
>
> As Leo already mentioned in his brutal way there has been quite some
> research
> in this area. In Nepomuk itself (not KDE but java, thus only ideas can be
> reused) Brian Davis from DERI Ireland did ontology extraction from plain
> text
> using Gate. Now I don't know if there is any real-world application for
> that
> but it could give nice ideas on how to solve smaller issues.
>

I'll take a look at that, thanks.



>
> > I think something like this can be possible:
> > - Create an ontology for commands. Commands will be related to DBUS
> calls,
> > mimetypes and properties.
> > - Properties include things like "readonly" or "modifydata" type of
> > command.
> >
> > - Commands will be translated by i18n teams
> > - The point will be try to map natural language to the data found in
> > Nepomuk. This if done correctly can be use for both map to commands or
> map
> > to data, behaving in practice like a desktop search "song from X ......"
> >
> > For instance:
> >
> > In the nepomuk data:
> >
> > - Fetch + email
> > - Send + email to
> > - Forward + email to
>
> this is already a nice idea. A thing that would be great is to provide this
> functionality to all runners. (I am only brain storming here). the
> application runner for example should also run konqeror when the user
> types "web browsing" or "internet" (actually the first one already works
> since it is the generic name of konqueror). I don't know if that is
> something
> that can be done generically but at least it's an idea.



I think that will work already. As we have a services runner that will find
that konqueror and firefox both are "web browser"



> Again, what i think is most important here is to allow users to use
> different
> words that mean the same thing plus some fuzzyness. If I misspell something
> the system should correct me internally.
>

IMHO this would be very difficult with the current runner infrastructure.
Unless more and more data is inserted in Nepomuk and the query library get
the fuzzyness, detection of errors, etc. I think that the work on the query
library may be part of what I would need for the natural language interface.
 So, yes, you give a good insight, more data to Nepomuk + more functionality
in the query =  more powerful features.
So, for instance I mentioned the services runner above. First, create a
strigi plugin (is Strigi in charge of this?) parse the service files and put
the info in Nepomuk. Later change KServiceTypeTrader and friends to use
Nepomuk instead of whatever they are using now.



>
> > Simplest case: the user writes: "fetch email", we look at the nepomuk
> data
> > and execute the action linked to "fetch email"
> > More interesting cases: the user writes "fetch my email", "check email",
> > "check new email", etc. The idea would be map these texts to "fetch
> email"
> > and no to the other possible email actions.
> >
> > So there are two steps:
> > 1 - Create the needed data in Nepomuk. Should be easy enough so that
> users
> > can add they own custom commands. Or programs register their own commands
> > when installed.
> > 2 - The "natural language to ontology" system that tries to find the most
> > similar underlying data for a given text.
> >
> > Note that I would like to work also in things like "check email and send
> it
> > to ....". Where "fetch email" and "send email" are defined in the data
> and
> > the rest of that sentence must be "translated" to the combination to
> those
> > two actions.
>
> this sounds nice but I doubt that users will actually type that in. This
> could
> be interesting in combination with speech recognition though.
>

Absolutely. That would be the killer application. But speak recognition
works so badly ...


>
> Also, i am not so sure that users will want to perform such actions through
> the runner in general. I doubt that I would type "send email to". I find
> the
> current way of just typing the receiptiants name much more convinient.
> Thus, I think that search and annotation are more important. I already
> mentioned search in the beginning of this chaotic email.
> Annotation means something different. In
> playground/base/nepomuk-kde/annotationplugins you find the beginnings of
> Nepomuk pimo annotation plugins. What you have there is the possibility to
> annotate things (files, contacts, emails, web pages, and so on) with types
> (i.e. "this webpage represents a Company" or "this contact is a Friend"
> or "this contact is a Developer") with arbitrary other things (create
> relations between things like "X works at company Y", and my personal
> favorite: the geonames annotation plugin which allows to annotate things
> with
> countries and cities directly fetched from the geonames web service (like
> "A
> lives in city X").


I've being years yearning for a feature that annotate my files with the
address I donwnload them from. Automatically.
All that data will be of extraordinary value when searching, making
inferences, etc.


> Here it could also be very interesting to use some algorithms to try to
> guess
> what the user means. At the moment if you type "worksat:mandriva" you get a
> different relation than with "works at:mandriva" or "employed by:mandriva".
> Would be great if all these could be matched to the same relation.
>

It will be needed so users don't end up having one trillion different
annotation squemes.
It is a very interesting problem also (given lack of external linguistic
resources,etc).


>
> Please let me know if I talk about things that are interesting for you at
> all.
>

Yes.Absolutely!
So many interesting problems!


>
> > Comments very much ... needed :P
> >
> > On Tue, Oct 28, 2008 at 3:06 AM, Sebastian Trüg <[EMAIL PROTECTED]>
> wrote:
> > > Just to quickly tell you that I am very interested: how about desktop
> > > search?
> > > (I know this is stupid and boring but just to get the ball rolling).
> > >
> > > Or maybe text analysis for keyword extraction or even concept
> extraction
> > > (research has already been done in Nepomuk)
> > >
> > > Ok, more tomorrow...
> > >
> > > Cheers,
> > > Sebastian
> > >
> > > On Saturday 25 October 2008 09:35:09 Jordi Polo wrote:
> > > > Hello,
> > > >
> > > > I already commented this in kde-devel and kde-core-devel but I think
> it
> > >
> > > is
> > >
> > > > worth commenting it here also.
> > > > I may have the opportunity to choose a project to work with for
> > > > something like one year (work on it means 8hours/day most days of the
> > > > year). I'd
> > >
> > > like
> > >
> > > > it to be KDE related.
> > > > The problem is that I have to convince my teacher about it.
> > > > The conditions are:
> > > > - It must be natural language processing related
> > > > - It should address some fundamental problem ( Something that wasn't
> > > > possible before becomes possible. Not in the KDE or FOSS world but it
> > > > general)
> > > >
> > > > I will send an email shortly (if they don't cut the servers too soon)
> > >
> > > with
> > >
> > > > what I have already proposed because it may be an interesting project
> > > > and
> > >
> > > I
> > >
> > > > want to discuss it by itself.
> > > > But in this thread I want to gather whatever crazy idea you may have.
> > > > Or maybe something in the TODO.
> > > >
> > > > Natural language processing is closely related to clustering of data,
> > > > classification of data and even patterns recognition.
> > > > It is also and of course related to processing text, getting
> summaries
> > > > or conclusions.
> > > >
> > > > And remember, one year or maybe even more is a lot of time...
> Anything
> > >
> > > that
> > >
> > > > you may think, please tell me.
> > >
> > > _______________________________________________
> > > nepomuk-kde mailing list
> > > nepomuk-kde@semanticdesktop.org
> > > http://lists.semanticdesktop.org/mailman/listinfo/nepomuk-kde
>
>
> _______________________________________________
> nepomuk-kde mailing list
> nepomuk-kde@semanticdesktop.org
> http://lists.semanticdesktop.org/mailman/listinfo/nepomuk-kde
>



-- 
Jordi Polo Carres
NLP laboratory - NAIST
http://www.bahasara.org

_______________________________________________
nepomuk-kde mailing list
nepomuk-kde@semanticdesktop.org
http://lists.semanticdesktop.org/mailman/listinfo/nepomuk-kde

Re: [nepomuk-kde] natural language processing + nepomuk

Reply via email to