Hello everyone!

Let me introduce you to a project that I am currently working on.

The scope of the project is to provide an easy way to search KDE code and translations repository since I consider that such a kind of an infrastructure would help possible newcomers to easily obtain valuable information about the work of the community. For example:

   - which projects exist

   - which ones are the most active

   - how developers describe their work on them

   - find out the developers that currently work on them

To make the long story short, I was thinking that a google-like search engine would facilitate onboarding of newcomers to KDE

So, I ended up to a solution that:

1. Fetches git and svn commit messages from the kde-commits mailing list

2. Parses each message and creates a json file that contains the below information:

- commit subject

- commit message

- author

- project

- commit date

- isrevision (does a relative phabricator task exist?)

- istranslation (is it a translation commit?)

- fixesbug (whether the commit is bug-related)

The relative code can be found  here https://github.com/dimkard/kde-commits-solr

3. Loads the json recordset to an Apache Solr instance

4. On top of apache Solr, Banana (port of Kibana for Solr) has been added. A custom searching panel has been created to provide fuzzy searches against KDE repositories.


Moreover, it could also be useful for KDE writers/promoters to get a clear view of the current development, either on code or translations, the new features, the bug-fixing work, etc


To better illustrate the tool, let's simulate the creation of a post like https://pointieststick.wordpress.com/2018/04/29/this-week-in-usability-productivity-part-16-everything-else/ , leveraging the functionalities offered by this solution.

At first, the promoter wants to get more info and add references to open/save dialog project improvements:

/Open/Save dialog project/

/The dialogs now display previews for the same assortment of file types as Dolphin does (Alex Nemeth)/

/Grid Spacing in icons view has been tightened up to match Dolphin, allowing more to be shown in the window (Alex Nemeth)/

/
/

In case that the writer remembers the name of the committer and knows that a relative bug report does exist, the facet in the left will be used and the relative time period will also be set (top-left):

https://framapic.org/z4PtCZxEul5K/L3tJZ8visR4I.png

https://framapic.org/DnveENis7bEa/BsKkske4RVPz.png

The records returned are:

https://framapic.org/8fv0crGCijf6/cPsbeZWO1CJH.png

so the commit in concern has been successfully found.


In case that no committer name is available, the writer may search for sth like:

https://framapic.org/wYn063MH0jPY/4zEdV8ngIidS.png

Then following the search suggestion

https://framapic.org/owrsQQ2a5HVW/gWUiTNz7xWTB.png

the relative commit will be returned as top result:

https://framapic.org/bNDgq0cbR7J7/xVd3HJqQ40Kv.png


The same applies for the second search:

https://framapic.org/qpddu38zvRJF/ssQO4MBEWt6s.png

since the relative commit is returned as well:

https://framapic.org/g31g6x5mIOxR/5PNUInPMNxYw.png


Moreover, although this is not its primary role, the solution provides some useful interactive visualization tools. For example, searching work on projects like plasma-phone-components, plasma-settings, plasma-mobile and kirigami, the tool would provide useful information regarding work on Plasma Mobile. So, a relative promo article could be accompanied with some useful statistics and references to real plasma mobile commits, like this:

https://framapic.org/2RW8LlxCjYkh/LbkUnVQZTyZV.png


In the future, such a solution could be further extended indexing bugzilla data as well. As a result, reports about possible duplicates could be automatically generated and, why not, a fuzzy search engine could be offered to the bug reporters enhancing the reporting experience, avoiding duplicates and frustration about irrelevant results.

Nevertheless, there is a set of factors that should be considered as well. At first, the amount of commits on a project is just an indicator -among many others- of the activity of a project. A lot of work may happen behind the scenes, in terms of communications, design, testing etc, and this work may be committed as a single or a few commits. So, considering all commits as equal is a trap. In addition, since the tool measures the # of commits by each developer, we may think twice about the implications of such a tool regarding the psychological effects on the personality of contributors.

Do you think that such a tool could help KDE community? I look forward to hearing your thoughts, since I am not still convinced if working on this would really help the KDE ecosystem.

PS: We may look at other alternatives as regards to the technologies involved. I’ve opted for the aforementioned since I have already worked on them in the past.

PS1: If similar projects that I am not aware of currently exist in KDE  we may consider using them instead of this approach (or join efforts if they are compatible). My intention is just to start a discussion about how big data, indexing and fuzzy searching may improve onboarding and "promotion" work.

Dimitris

Reply via email to