Fuzzy searching against KDE repositories

Dimitris Kardarakos Sun, 06 May 2018 08:05:47 -0700

Hello everyone!

Let me introduce you to a project that I am currently working on.

The scope of the project is to provide an easy way to search KDE codeand translations repository since I consider that such a kind of aninfrastructure would help possible newcomers to easily obtain valuableinformation about the work of the community. For example:


   - which projects exist

   - which ones are the most active

   - how developers describe their work on them

   - find out the developers that currently work on them

To make the long story short, I was thinking that a google-like searchengine would facilitate onboarding of newcomers to KDE


So, I ended up to a solution that:

1. Fetches git and svn commit messages from the kde-commits mailing list

2. Parses each message and creates a json file that contains the belowinformation:


- commit subject

- commit message

- author

- project

- commit date

- isrevision (does a relative phabricator task exist?)

- istranslation (is it a translation commit?)

- fixesbug (whether the commit is bug-related)

The relative code can be found herehttps://github.com/dimkard/kde-commits-solr


3. Loads the json recordset to an Apache Solr instance

4. On top of apache Solr, Banana (port of Kibana for Solr) has beenadded. A custom searching panel has been created to provide fuzzysearches against KDE repositories.

Moreover, it could also be useful for KDE writers/promoters to get aclear view of the current development, either on code or translations,the new features, the bug-fixing work, etc

To better illustrate the tool, let's simulate the creation of a postlikehttps://pointieststick.wordpress.com/2018/04/29/this-week-in-usability-productivity-part-16-everything-else/, leveraging the functionalities offered by this solution.

At first, the promoter wants to get more info and add references toopen/save dialog project improvements:


/Open/Save dialog project/

/The dialogs now display previews for the same assortment of file typesas Dolphin does (Alex Nemeth)/

/Grid Spacing in icons view has been tightened up to match Dolphin,allowing more to be shown in the window (Alex Nemeth)/

/
/

In case that the writer remembers the name of the committer and knowsthat a relative bug report does exist, the facet in the left will beused and the relative time period will also be set (top-left):


https://framapic.org/z4PtCZxEul5K/L3tJZ8visR4I.png

https://framapic.org/DnveENis7bEa/BsKkske4RVPz.png

The records returned are:

https://framapic.org/8fv0crGCijf6/cPsbeZWO1CJH.png

so the commit in concern has been successfully found.

In case that no committer name is available, the writer may search forsth like:


https://framapic.org/wYn063MH0jPY/4zEdV8ngIidS.png

Then following the search suggestion

https://framapic.org/owrsQQ2a5HVW/gWUiTNz7xWTB.png

the relative commit will be returned as top result:

https://framapic.org/bNDgq0cbR7J7/xVd3HJqQ40Kv.png


The same applies for the second search:

https://framapic.org/qpddu38zvRJF/ssQO4MBEWt6s.png

since the relative commit is returned as well:

https://framapic.org/g31g6x5mIOxR/5PNUInPMNxYw.png

Moreover, although this is not its primary role, the solution providessome useful interactive visualization tools. For example, searching workon projects like plasma-phone-components, plasma-settings, plasma-mobileand kirigami, the tool would provide useful information regarding workon Plasma Mobile. So, a relative promo article could be accompanied withsome useful statistics and references to real plasma mobile commits,like this:


https://framapic.org/2RW8LlxCjYkh/LbkUnVQZTyZV.png

In the future, such a solution could be further extended indexingbugzilla data as well. As a result, reports about possible duplicatescould be automatically generated and, why not, a fuzzy search enginecould be offered to the bug reporters enhancing the reportingexperience, avoiding duplicates and frustration about irrelevant results.

Nevertheless, there is a set of factors that should be considered aswell. At first, the amount of commits on a project is just an indicator-among many others- of the activity of a project. A lot of work mayhappen behind the scenes, in terms of communications, design, testingetc, and this work may be committed as a single or a few commits. So,considering all commits as equal is a trap. In addition, since the toolmeasures the # of commits by each developer, we may think twice aboutthe implications of such a tool regarding the psychological effects onthe personality of contributors.

Do you think that such a tool could help KDE community? I look forwardto hearing your thoughts, since I am not still convinced if working onthis would really help the KDE ecosystem.

PS: We may look at other alternatives as regards to the technologiesinvolved. I’ve opted for the aforementioned since I have already workedon them in the past.

PS1: If similar projects that I am not aware of currently exist in KDE we may consider using them instead of this approach (or join efforts ifthey are compatible). My intention is just to start a discussion abouthow big data, indexing and fuzzy searching may improve onboarding and"promotion" work.


Dimitris

Fuzzy searching against KDE repositories

Reply via email to