Re: [Nepomuk] The zombie processes bug ( 302143 )

2012-12-15 Thread Vishesh Handa
Hey David.

Do you think you could please look at this?


On Wed, Dec 12, 2012 at 7:58 AM, Simeon Bird bla...@gmail.com wrote:

 ( the report is https://bugs.kde.org/show_bug.cgi?id=302143 )

 The last couple of days (not sure why: I think triggered somehow by the
 virtuoso deadlocks Vishesh posted a patch for recently)
 I started hitting the nepomuk zombie processes bug, and so I figured this
 was a good opportunity to debug it.

 Turns out the root cause is a (quite silly) QProcess bug. I found the
 source here:
 http://qt.gitorious.org/qt/qt/blobs/4.8/src/corelib/io/qprocess_unix.cpp
 The short version is: QProcess doesn't check errors properly.

 The longer version:

 When QProcess-start() is called, Qt creates a pipe to the process to get
 its exit value and output.
 It does this with qt_create_pipe, which calls qt_safe_pipe. qt_safe_pipe,
 on failure, returns 1.
 If this happens, qt_create_pipe fails, sets errno, prints a warning:
 [/usr/bin/nepomukservicestub] QProcessPrivate::createPipe: Cannot create
 pipe 0x1987228: Too many open files
 and returns void, carefully ignoring the error.

 The calling function, QProcessPrivate::startProcess, does not check errno,
 and thus continues on its merry
 way assuming the pipe has been created successfully, and creates a
 QSocketNotifier with it.
 Since the pipe is not valid, this fails and prints a warning:
 [/usr/bin/nepomukservicestub] QSocketNotifier: Invalid socket specified

 The calling process again does not check for an error, continues on its
 merry way and
 forks off the child process (incidentally obliterating the value of errno
 from qt_create_pipe).
 Note that since the child process is actually created correctly, no
 QProcess error is set,
 so we can't fix it by checking for error().

 The child process then has no way to pass its exit value to the calling
 process, since the
 communication pipes it would normally use do not exist, and thus when it
 exits it becomes
 a zombie.

 As a bonus, once the first timeout timer for a broken process happens,
 waitForFinished is called,
 which crashes, because it is trying to wait on a pipe which does not exist.
 (This was reported with a patch a year ago, but not fixed:
 https://bugreports.qt-project.org/browse/QTBUG-18934 )

 There is another KDE bug which seems to have the same root cause:
 https://bugs.kde.org/show_bug.cgi?id=252602

 So far as I can see, this really needs to be fixed in QProcess.
 The fix would, I guess, make qt_create_pipe return an integer, and
 then have startProcess check the return value,
 set processError and abort.

 Can this be done in a reasonable timeframe? Does anyone know how to submit
 Qt patches?

 Simeon

 ___
 Nepomuk mailing list
 Nepomuk@kde.org
 https://mail.kde.org/mailman/listinfo/nepomuk




-- 
Vishesh Handa
___
Nepomuk mailing list
Nepomuk@kde.org
https://mail.kde.org/mailman/listinfo/nepomuk


Re: [Nepomuk] Review Request: Make ${NEPOMUK_CORE_DBUS_INTERFACES_DIR} an absolute path; fixes problem building kde-runtime on WinXP

2012-12-15 Thread Vishesh Handa

---
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/107665/#review23501
---


Sorry about the late response. I really have no idea if this is the correct way 
to fix this. I'll try to investigate.

- Vishesh Handa


On Dec. 11, 2012, 12:39 p.m., Thomas Friedrichsmeier wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 http://git.reviewboard.kde.org/r/107665/
 ---
 
 (Updated Dec. 11, 2012, 12:39 p.m.)
 
 
 Review request for kdewin and Nepomuk.
 
 
 Description
 ---
 
 Frankly, I am not sure at all, whether this is the correct place to fix the 
 issue.
 Either way, without this, building of kde-runtime on WinXP fails, because 
 calls like the following (in kde-runtime/nepomuk/kcm/CMakeLists.txt):
   qt4_add_dbus_interface(kcmnepomuk_SRCS 
 ${NEPOMUK_CORE_DBUS_INTERFACES_DIR}/org.kde.NepomukServer.xml 
 nepomukserverinterface)
 will look for the interface in a location _relative_ to the current sources. 
 Sorry, I forgot to make a copy of the exact error message.
 
 Making ${NEPOMUK_CORE_DBUS_INTERFACES_DIR} an absolute path works around 
 this, successfully, but again, I'm not sure, whether that is the correct fix.
 
 
 Diffs
 -
 
   NepomukCoreConfig.cmake.in 81c084b50ea065d98798bf86e698dfdf1c8284d7 
 
 Diff: http://git.reviewboard.kde.org/r/107665/diff/
 
 
 Testing
 ---
 
 With the patch to nepomuk-core, kde-runtime compiles with MinGW4 on WinXP. 
 Previously it did not.
 
 
 Thanks,
 
 Thomas Friedrichsmeier
 


___
Nepomuk mailing list
Nepomuk@kde.org
https://mail.kde.org/mailman/listinfo/nepomuk


Re: [Nepomuk] The zombie processes bug ( 302143 )

2012-12-15 Thread David Faure
On Saturday 15 December 2012 16:12:10 Vishesh Handa wrote:
 Hey David.
 
 Do you think you could please look at this?

This is Oswald's area of expertise, I forwarded Simeon's mail to him.

-- 
David Faure, fa...@kde.org, http://www.davidfaure.fr
Working on KDE, in particular KDE Frameworks 5

___
Nepomuk mailing list
Nepomuk@kde.org
https://mail.kde.org/mailman/listinfo/nepomuk


[Nepomuk] [RFC] Simplify Nepomuk Graph handling

2012-12-15 Thread Vishesh Handa
Hey everyone

This is another one of those big changes that I have been thinking about
for quite some time. This email has a number of different proposals, all of
which add up to create this really simple system, with the same
functionality.

Graph Introduction
---

For those of you who don't know about graphs in Nepomuk. Please read [1].
It serves as a decent introduction to where Graphs are used. Currently, we
create a new graph for each data-management command.

What does this provide?
--

We currently use graphs for 2 features -

1. Remove Data By Application
2. Backup

What all information do we store?


1. Creation date of each graph
2. Modification date of each graph ( Always the same as creation date )
3. Type of the graph - Normal or Discardable
4. Maintained by which application

(1) and (2) currently serve us no purpose. They never have. They are just
things that are nice to have. I cannot even name a single use case for it.
Except for they let us see when a statement was added.

(3) is what powers Nepomuk Backup. We do not backup everything but only
backup the data that is not discardable. So, stuff like indexing
information is not saved. Currently this system is slightly broken as one
cannot just filter on the basis of not Discardable Data, as that includes
stuff like the Ontologies. So the queries get quite complicated. Plus, one
still needs to save certain information from the Discardable Data such as
the rdf:type, nao:creation, and nao:lastModified. Hence, the query becomes
even more complex. For my machine with some 10 million triples, creating a
backup takes a sizeable amount of time ( Over 5 minutes ), with a lot of
cpu execution.

Current query -

select distinct ?r ?p ?o ?g where {
graph ?g { ?r ?p ?o. }
?g a nrl:InstanceBase .
FILTER( REGEX(STR(?r), '^nepomuk:/(res/|me)') ) .
FILTER NOT EXISTS { ?g a nrl:DiscardableInstanceBase . }
} ORDER BY ?r ?p

+ Requires additional queries to backup the type, nao:lastModified, and
nao:created.

Maybe it would be simpler if we did not make this distinction? Instead we
backup everything (really fast), and just discard the data for files that
no longer exist during restoration? It would save users the trouble of
re-indexing their files as well. More importantly, it (might) save them the
trouble of re-indexing their email, which is a very slow process.

Also, right now one can only set the graph via StoreResources, and not via
any other Data Management command.



(4) is the most important reason for graphs. It allows us to know which
application added the data. Stuff starts to get a little messy, when two
application add the same data. In that case those statements need to be
split out of their existing graph and a new graph needs to be created which
will be maintained by the both the applications. This is expensive.

I'm proposing that instead of splitting the statement out of the existing
graph, we just create a duplicate of the statement with a new graph,
containing the other application.

Eg -

Before -

graph G1 { resA a nco:Contact . }
G1 nao:maintainedBy App1 .
G1 nao:maintainedBy App2 .

After -

graph G1 { resA a nco:Contact . }
graph G2 { resA a nco:Contact . }
G1 nao:maintainedBy App1
G2 nao:maintainedBy App2 .

The advantage of this approach is that it would simplify some of the
extremely complex queries in the DataManagementModel. That would result in
a direct performance upgrade. It would also solve some of the ugly
transaction problems we have 2 commands are accessing the same statement,
and one command removes the data in order to move it to another graph. This
has happened to me a couple of times.

---

My third proposal is that considering that the modification and creation
date of a graph do not serve any benefit. Perhaps we shouldn't store them
at all? Unless there is a proper use case, why go through the added effort?
Normally, storing a couple of extra properties isn't a big deal, but if we
do not store them, then we can effectively kill the need to create new
graph for each data management command.

With this one would just need 1 graph per application, in which all of its
data would reside. We wouldn't need to check for empty graphs or anything.
It would also reduce the number of triples in a database, which can get
alarmingly high.

This seems like a pretty good system to me, which provides all the benefits
and none of the losses.

What do you guys think?

[1] http://techbase.kde.org/Projects/Nepomuk/GraphConcepts

-- 
Vishesh Handa
___
Nepomuk mailing list
Nepomuk@kde.org
https://mail.kde.org/mailman/listinfo/nepomuk