Page "Proposals/BEP-0014" was changed by antonia.horincar
Diff URL: 
<https://issues.apache.org/bloodhound/wiki/Proposals/BEP-0014?action=diff&version=2>
Revision 2
Changes:
-------8<------8<------8<------8<------8<------8<------8<------8<--------
Index: Proposals/BEP-0014
=========================================================================
--- Proposals/BEP-0014 (version: 1)
+++ Proposals/BEP-0014 (version: 2)
@@ -1,69 +1,91 @@
-
-= BEP <BEP number> : <BEP title> #overview
+= BEP 14 : Add Apache Solr to Bloodhound #overview
 
 [[PageOutline]]
 
-|| '''BEP''' || <BEP number> ||
-|| '''Title''' || <BEP title> ||
-|| '''Version''' || <leave blank> ||
-|| '''Last-Modified''' || <leave blank> ||
-|| '''Author''' || Author With Email <[email protected]>, Author Name Only, or The 
Bloodhound project (see [wiki:/Proposals#bep-header-preamble BEP preamble 
explained]) ||
+|| '''BEP''' || 14 ||
+|| '''Title''' || Add Apache Solr to Bloodhound ||
+|| '''Version''' ||  ||
+|| '''Last-Modified''' ||  ||
+|| '''Author''' || Antonia Horincar <[email protected]> ||
 || '''Status''' || Draft ||
-|| '''Type''' || <BEP type (see [wiki:/Proposals#bep-types BEP types 
explained])> ||
+|| '''Type''' || Standards Track ||
 || '''Content-Type''' || [wiki:PageTemplates/Proposals text/x-trac-wiki] ||
-|| '''Created''' || <leave blank> ||
-|| '''Post-History''' || <leave blank> ||
+|| '''Created''' ||  ||
+|| '''Post-History''' ||  ||
 
 ----
 
 == Abstract #abstract
 
-<Delete text in this section and add a short (~200 word) description of the 
technical issue being addressed. Take a look at sample abstract below>
+The Bloodhound Search plugins supports different search backends, but only 
Woosh has been implemented so far[1]. Even though Whoosh (Python implemented 
search backend [2]) is a great solution for a small amount of data, it doesn’t 
provide scalability when it comes to dealing to a higher amount of items to 
search. Apache Solr is a search platform focused on delivering enterprise 
class, high performance search functionality[3]. Solr is written in Java and 
runs as a standalone full-text search server within a servlet container such as 
Apache Tomcat or Jetty. Solr uses the Lucene Java search library at its core 
for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs 
that make it usable for most popular programming languages[4]. It provides 
great scalability, and great performance under heavy usage, being a great tool 
for services having a high number of users, and high amount of data.
 
-This template provides a boilerplate or sample template for creating your
-own BEPs.  In conjunction with the [wiki:/Proposals general content 
guidelines] and the [wiki:/Proposals/Formats/WikiFormatting WikiFormatting BEP 
guidelines]  
-, this should make it easy for you to conform your own
-BEPs to the format outlined below. See [#howto How to Use This Template] for 
further instructions.
+== Rationale #rationale
 
-**Note**: if you are reading this template via the web, you should first try 
to create a new wiki page by selecting `ProposalsRst` |page template guide|.  
**DO NOT EDIT THIS WIKI PAGE IN ORDER TO CREATE A NEW BEP! **
+When considering providing Solr support for an existing Python service, there 
are multiple solutions, as there are multiple Python libraries that provide 
REST-ful interaction with the Solr server. 
 
-If you would prefer not to use WikiFormatting markup in your BEP, please see  
[wiki:/Proposals/Formats/RestructuredText reStructuredText BEP guidelines].
+Two of the most popular Python libraries for working with Solr are Sunburnt 
and Solrpy. They are both open source, well documented libraries that make use 
of Solr’s REST-like JSON API. There are a lot of similarities between the two 
libraries, in terms of speed and performance.
 
-== Motivation ==
+Sunburnt is more extensible, providing the possibility of implementing a wider 
variety of operations, while solrpy is more restrictive from this point of 
view. Sunburnt allows query chaining like:
+query.find_by_phrase(title="Bloodhound").paginate(4, 40).execute()
+It also supports more complex queries than solrpy, being a better solution 
when regarding the long term development of Bloodhound. Important features like 
pagination, and facets can be accomplished with the provided methods in 
sunburnt. One way of achieving spell checking, and suggestions is to extend the 
default Sunburnt functionality, as shown in this example [4].
 
-<The motivation is critical for BEPs that want to change the copy of ''Trac'' 
patched using vendor branch . It should clearly explain why the existing 
''Bloodhound'' solution is inadequate to address the problem that the ''BEP'' 
solves. ''BEP'' submissions without sufficient motivation may be rejected 
outright. >
+On the other hand, solrpy doesn’t have any dependencies, as it works out of 
the box without any issues. Sunburnt needs httplib2 [5] (or requests [6]) and 
lxml [7]. It is also strongly recommended to use the mx.DateTime [8] and the 
pytz [9] libraries for a better experience. 
+
+Since Sunburnt provides a more extensive way of using the Solr search service 
features, it will be the one used for the project, but switching from Sunburnt 
to Solrpy and vice-versa is easy in the early development stages.
 
 == Proposal #proposal
 
-<The technical specification should describe any new features , detail its 
impact on the components architecture , mention what plugins will be included 
as a result , whether they are hosted by ​[http://trac-hacks.org 
trac-hacks.org] or not , and any other relevant technical subject . The 
specification should be detailed enough to allow competing, interoperable 
implementations for any of the current supported database platforms (e.g. 
''SQLite'', ''Postgres'', ''MySQL'') and server technologies (e.g. ''Apache 
HTTPD server'', ''nginx'', ''mod_wsgi'', ''CGI'').. >
+The first step that needs to be taken when integrating the Solr backend 
service to an application is having the Solr service installed. Solr must be 
run on a server that supports Java, such as Apache Tomcat, or Jetty. Installing 
a Tomcat server is very straightforward on Linux and Mac machines [10] [11], as 
well as having the Solr service installed on a Tomcat instance.
 
-== Rationale #rationale
+After having the Tomcat server up and running, the next step is creating Solr 
schemas for the Bloodhound classes that are searchable (i.e. Tickets, Comments, 
Milestones, etc.) This is done through the schema.xml files in the Solr 
configuration folders. These files must contain information about all the 
fields of the classes that must be indexed. This feature of Solr (that makes it 
different from Lucene, which does not require a schema) makes searching very 
flexible, by allowing only the defined fields to be searchable. 
 
-<The rationale fleshes out the specification by describing what motivated the 
design and why particular design decisions were made. It should describe 
alternate designs that were considered and related work, e.g. how the feature 
is supported in other issue trackers or ''Trac'' hacks . The rationale should 
provide evidence of consensus within the community and discuss important 
objections or concerns raised during discussion. Take a look at sample 
rationale below>
+The Solr schema will be generated using a trac-admin task, by populating a 
pre-defined XML template with the attributes of a specified resource.
 
-''BEP'' submissions come in a wide variety of forms, not all adhering to the 
format guidelines set forth below. Use this template, in conjunction with the 
[wiki:/Proposals general content guidelines] and the 
[wiki:/Proposals/Formats/WikiFormatting WikiFormatting BEP guidelines], to 
ensure that your ''BEP'' submission is easy to read and understand.
+After having Solr configured and running, the next step is making the 
Bloodhound instance interact with the Solr service for the searching and 
indexing operations. This will be done using the Sunburnt library, which 
provides methods for interacting with the Solr JSON API in a REST-ful way. 
Next, the Solr service will be pre-populated with the existing data in 
Bloodhound database, by creating a script to iterate through all the fields and 
add their values in Solr. Therefore, method calls of the Sunburnt instance are 
required in the Bloodhound code.
 
-This template allows to create BEPs and is very similar to 
[http://www.python.org/dev/peps/pep-0012 PEP 12] . However it has been 
optimized by moving long explanations to the 
[wiki:/Proposals/Formats/WikiFormatting WikiFormatting BEP guidelines] . If you 
are interested take a look at the  [?action=diff&old_version=1 differences]. 
The goal is to redact new BEPs just by following in-line instructions between 
angle brackets (i.e. **<** **>**) . Even if this will allow to write BEPs 
faster , it is highly recommended to read the 
[wiki:/Proposals/Formats/WikiFormatting WikiFormatting BEP guidelines] at least 
once in your lifetime to be aware of good practices and expected style rules . 
+In order to interact with Solr from the Bloodhound instance, the existing 
callbacks [12] for interacting with the main Bloodhound searchable objects, 
will be used:
+- on creation, insert a new entry in the Solr service
+- on delete, find the corresponding entry in Solr, and delete it.
+- on editing, find the corresponding entry in Solr, delete it, and add the 
entry with the new correct information
 
-== How to Use This Template #howto
+Placing the Solr interactions in the callbacks ensures that every successful 
operation triggers the corresponding operation with Solr. 
 
-<BEPs may include further sections. This is an example.>
+When the user performs a search, the results returned, are the ones provided 
by the Solr instance for the searched query. All these operations can be done 
in the existent Bloodhound implementation, using the ISearchBackend [13] 
interface, which provides methods for all the operations described above 
(adding, editing, deleting, and querying). 
 
-Quick edits will consist in following the instructions inside angle brackets 
(i.e. **<** **>**) . That should be everything needed to write new BEPs. To be 
more informed about advanced considerations please read the [wiki:/Proposals 
general content guidelines] and the [wiki:/Proposals/Formats/WikiFormatting 
WikiFormatting BEP guidelines] . If there is no point in including one of the 
sections in this document then feel free to remove it.
+The code will be structured as a plugin, which can be enabled, or disabled, so 
that the administrator can switch between multiple backend engines (when they 
decide that Solr is not suitable for their project).
 
-== Backwards Compatibility #backwards-compatibility
+== Deliverables #deliverables
+- Solr configuration files required for having the Solr running
+- Schema files for mapping the Bloodhound objects to Solr 
+- A Bloodhound plugin containing the code that establishes the interaction 
between the Bloodhound server and the Solr server
+- Unit tests for the Bloodhound plugin
+- Documentation
+- Further work on Bloodhound’s libraries and packages
 
-<All BEPs that introduce backwards incompatibilities must include a section 
describing these incompatibilities and their severity. The ''BEP'' must explain 
how to deal with these incompatibilities. ''BEP'' submissions without a 
sufficient backwards compatibility treatise may be rejected outright. >
+== Timeframe #timeframe
 
-== Reference Implementation #reference-implementation
+''April 21 - May 18''
+- Research Apache Solr and Sunburnt
+- Collaborate with the Bloodhound community, in order to get a more in depth 
understanding of all the modules and classes that deal with searching in 
Bloodhound at the moment
+- Draft an implementation design for the plugin
 
-< The reference implementation **must** be completed before any ''BEP'' is 
given status **Final**, but it need not be completed before the ''BEP'' is 
accepted. It is better to finish the specification and rationale first and 
reach consensus on it before writing code. The final implementation **must** 
include test code and documentation appropriate for either the wiki pages in 
''Bloodhound'' users guide or an specific wiki page in the 
[http://issues.apache.org/bloodhound issue tracker] . >
+''May 19 - June 22 ''
+- Generate the schema file for Solr
+- User Sunburnt to create script for pre-populating Solr with existing data in 
Bloodhound database
+- Add methods to process and introduce the new data into Solr. 
 
-<In order to list tickets related to a given proposal edit sample text 
provided below by including the appropriate **<BEP number>**. Target tickets 
have to be tagged with `bep-<BEP number>` keyword. Do not forget to remove 
curly braces so that the tickets list will be actually rendered.>
+''June 23 - June 26''
+- Midterm evaluation
 
-{{{
-[[Widget(TicketQuery, query="keywords=~bep-<BEP 
number>&col=id&col=summary&col=status&col=priority&col=milestone", title=BEP 
<BEP number> ticket summary)]]
-}}}
+''June 27 - August 10''
+- Create methods to create queries for Solr, and bind them to the existing 
searching infrastructure 
+- Develop unit tests for the API
+
+''August 11 - August 17''
+- Refactor code, finish tests and documentation
+
+''August 18 - August 21''
+- Final evaluation
 
 == Resources #resources
 
@@ -73,15 +95,19 @@
 
 == References #references
 
-<List the references included in BEP body>
-
-  1. PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton
-     http://www.python.org/dev/peps/pep-0001/
-  2. PEP 9, Sample Plaintext PEP Template, Warsaw
-     http://www.python.org/dev/peps/pep-0009
-  2. PEP 12, Sample reStructuredText ''PEP'' Template, Goodger, Warsaw
-     http://www.python.org/dev/peps/pep-0012/
-  3. http://www.opencontent.org/openpub/
+1. http://stackoverflow.com/questions/3226596/full-text-search-whoosh-vs-solr
+2. http://deeson-online.co.uk/blog/when-use-apache-solr-drupal
+3. http://en.wikipedia.org/wiki/Apache_Solr
+4. https://groups.google.com/forum/#!topic/python-sunburnt/rcbd2yLLUaQ
+5. https://code.google.com/p/httplib2/
+6. http://requests.readthedocs.org/en/latest/
+7. http://lxml.de/
+8. http://www.egenix.com/products/python/mxBase/mxDateTime/
+9. http://pytz.sourceforge.net/
+10. http://wolfpaulus.com/jounal/mac/tomcat7/
+11. 
https://www.digitalocean.com/community/articles/how-to-install-apache-tomcat-on-ubuntu-12-04
+13. 
https://github.com/apache/bloodhound/blob/trunk/bloodhound_search/bhsearch/api.py#L94
+12. 
https://github.com/apache/bloodhound/blob/trunk/bloodhound_search/bhsearch/search_resources/ticket_search.py#L45
 
 == Copyright #copyright
 
-------8<------8<------8<------8<------8<------8<------8<------8<--------

--
Page URL: <https://issues.apache.org/bloodhound/wiki/Proposals/BEP-0014>
Apache Bloodhound <https://issues.apache.org/bloodhound/>
The Apache Bloodhound issue tracker

This is an automated message. Someone added your email address to be
notified of changes on 'Proposals/BEP-0014' page.
If it was not you, please report to .

Reply via email to