Re: [Architecture] [C5][APIM] Full Text Search

Rajith Roshan Tue, 24 Jan 2017 03:26:28 -0800

Hi all,

We have integrated full text search to the api manager C5 server. We
implemented full text search and attribute search for Oracle, MsSQL, MySQL,
Postgres, and H2.


*Full text search *: Search will be done for apis regardless of the
attribute of the api. Full text indexes will be created for AM_API table.
The indexes will include the columns which will be used in the full text
search. The indexing and querying process differ from database to database.
Sample curl full text query would be as follows

"curl  -H "Authorization: Basic YWRtaW46YWRtaW4="
http://127.0.0.1:9090/api/am/store/v0.10/apis?*query="test*
&offset=1&limit=10""

*Attribute search* : This will be used to search APIS based on their
attributes (metadata). This is implemented as usual SQL "Like" queries.
Sample curl command would be as follows.

"curl  -H "Authorization: Basic YWRtaW46YWRtaW4="
http://127.0.0.1:9090/api/am/store/v0.10/apis?
*query="name:test,version:8.0.0*&offset=1&limit=10""

We have carried out latency test for MsSQL [1], Oracle [2], Postgres [3],
and MySQL [4] databases using different number of APIS and with different
concurrency levels .
Even for 10,000 API s all the databases shows manageable latency (10,000
APIS can be a very rare case for single tenant).

Please share your thoughts on the latency for full text search and
attribute search

[1] -
https://docs.google.com/a/wso2.com/spreadsheets/d/1fhwz5T-cIZzhpgs2Np6eQIeBN2yyB55EgWLXGtXjABg/edit?usp=sharing
[2] -
https://docs.google.com/a/wso2.com/spreadsheets/d/18qW6OeH9d7VFq1d6GaCRFQV09I9rbmmJq0EVrTqwb-M/edit?usp=sharing
[3] -
https://docs.google.com/a/wso2.com/spreadsheets/d/11okKYYeAz8OY7_2VAYnlqbwh79sog5bkplnZEsnPeQo/edit?usp=sharing
[4] -
https://docs.google.com/a/wso2.com/spreadsheets/d/1r0b9YlEGZ5VTFbPatTW14WBLBoDB7l6ecmK7arqS4KA/edit?usp=sharing


Thanks!
Rajith

On Tue, Jan 24, 2017 at 1:10 PM, Malith Jayasinghe <[email protected]> wrote:

> Ok. You could also consider writing a small micro bench-mark and get the
> performance numbers (instead of testing this using an external load
> generator). This will minimize the impact of other components/layers on the
> results.
>
> On Tue, Jan 24, 2017 at 9:56 AM, Rajith Roshan <[email protected]> wrote:
>
>> Hi Malith,
>>
>> Thanks for your input. I have changed the jmeter scripts according to you
>> instructions.
>> I was using a oracle docker instance in my local machine. I have changed
>> it to a remote oracle database. Now the latency is much less. I will share
>> all the performance numbers once I have finished collecting them for
>> oracle, mssql, mysql and postgres databases.
>>
>> Thanks!
>> Rajith
>>
>> On Tue, Jan 24, 2017 at 9:46 AM, Malith Jayasinghe <[email protected]>
>> wrote:
>>
>>> @Rajith
>>>
>>> As per the offline discussion we had regarding the performance
>>> evaluation (using JMETER) of the two methods:
>>> - use 2 separate thread groups for evaluating the performance of 2 DB
>>> based search methods
>>> - run each thread group sequentially
>>> - run the test for a longer period under different concurrency level and
>>> record the latency and TPS
>>>
>>> With regard to the long latencies you are noticing, we need to figure
>>> out if this is related to the database query/queries or something else. To
>>> do a quick test: simply log the execution time of the query/queries. If the
>>> execution times are high (or have spikes) then we can try to optimize the
>>> DB query. Otherwise we have to do some further investigations.
>>>
>>> Thanks
>>>
>>> Malith
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Jan 14, 2017 at 10:43 PM, Nuwan Dias <[email protected]> wrote:
>>>
>>>> File system based indexers bring new challenges in the container world.
>>>> It also poses challenges in HA (multinode) and multi regional deployments.
>>>> Therefore if we're selecting a solr indexing approach, the only practical
>>>> solution is to go with a solr server based architecture.
>>>>
>>>> Even with a solr server assisted architecture, we still have
>>>> complexities when dealing with multi regional deployments. Also since API
>>>> artifacts reside in the database, if we're loading search results from an
>>>> index, we have to sync the permissions of the artifacts in the index too.
>>>> Plus this makes the deployment complex.
>>>>
>>>> Given the complexities that have to be addressed in an indexing
>>>> solution, I'd prefer to opt with the DB based search at least to start off
>>>> with. Since APIs and their permissions are both maintained in the DB, it
>>>> would be much simple if the search also works on that. Unlike in C4 this
>>>> database will only have data of 1 tenant. Hence we're not expecting the API
>>>> count to be in the thousands. Therefore I think searching through the
>>>> database should be feasible.
>>>>
>>>> Thanks,
>>>> NuwanD.
>>>>
>>>> On Sat, 14 Jan 2017 at 8:27 am, Chandana Napagoda <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Rajith,
>>>>>
>>>>> I believe indexing based approach is more suitable for any search
>>>>> solution because its design is to support fast search results. If you use
>>>>> database search, you have to use multiple DB indexes to improve search
>>>>> performance, and that will reduce the performance of insert operations.
>>>>>
>>>>> I think, REG_LOG kind of history table is not necessary for APIM
>>>>> product, since it is totally aware of the APIM artifacts(APIs, Docs, forum
>>>>> posts) and you can directly connect to the DB[1] from the text search
>>>>> engine and index the resources. As per the Solr documentation, it is
>>>>> capable of importing only the new additions(delta) for indexing.
>>>>>
>>>>> [1]. https://cwiki.apache.org/confluence/display/solr/Uploading+S
>>>>> tructured+Data+Store+Data+with+the+Data+Import+Handler
>>>>>
>>>>> Regards,
>>>>> Chandana
>>>>>
>>>>>
>>>>> On Fri, Jan 13, 2017 at 11:44 AM, Rajith Roshan <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> We are currently evaluating how to perform full text search at the
>>>>> database level for C5 based API Manager. We will be evaluating this for
>>>>> different types of databases to find their implementation complexities and
>>>>> limitations.
>>>>> Other option available for us to use indexing based approach (use Solr)
>>>>>
>>>>> *Database full text search *
>>>>> *Pros*
>>>>>
>>>>>    - Less complications when using container based approach
>>>>>    - Clustering will require only database syncing.
>>>>>    - No need to maintain and ship external search engine.
>>>>>
>>>>> *Cons*
>>>>>
>>>>>    - Implementation may vary significantly based on the database type
>>>>>    - There can be limitation in full text search for particular
>>>>>    database types (For ex: mysql full text support only prefix search)
>>>>>    - Queries will differ based on database type
>>>>>    - Document search will not be available, because they are stored
>>>>>    as blobs
>>>>>
>>>>>
>>>>> *Indexing based approach *
>>>>> *Pros*
>>>>>
>>>>>    - Document search
>>>>>    - Search will be efficient (No need to access database)
>>>>>
>>>>> *Cons*
>>>>>
>>>>>    - Since indexing data is written to file system , when going for
>>>>>    container based approach we would require mechanisms to file system 
>>>>> mounting
>>>>>    - Syncing indexers in a cluster would require something similar to
>>>>>    existing C4 based registry architecture (use of REG_LOG table)
>>>>>    - Maintaining (for ex: Version updates) and shipping external
>>>>>    search engine.
>>>>>
>>>>> Your valuable input regrading this is highly appreciated.
>>>>>
>>>>> Thanks!
>>>>> Rajith
>>>>>
>>>>> --
>>>>> Rajith Roshan
>>>>> Software Engineer, WSO2 Inc.
>>>>>
>>>>>
>>>>> Mobile: +94-72-642-8350 <%2B94-71-554-8430>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Chandana Napagoda*
>>>>> Associate Technical Lead
>>>>> WSO2 Inc. - http://wso2.org
>>>>>
>>>>> *Email  :  [email protected] <[email protected]>**Mobile :
>>>>> +94718169299 <+94%2071%20816%209299>*
>>>>>
>>>>> *Blog  :    http://cnapagoda.blogspot.com
>>>>> <http://cnapagoda.blogspot.com> | http://chandana.napagoda.com
>>>>> <http://chandana.napagoda.com>*
>>>>>
>>>>> *Linkedin : http://www.linkedin.com/in/chandananapagoda
>>>>> <http://www.linkedin.com/in/chandananapagoda>*
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Architecture mailing list
>>>> [email protected]
>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>
>>>>
>>>
>>>
>>> --
>>> Malith Jayasinghe
>>>
>>> WSO2, Inc. (http://wso2.com)
>>> Email   :[email protected]
>>> Mobile :0770704040
>>> Blog     :https://medium.com/@malith.jayasinghe
>>> <https://medium.com/@malith.jayasinghe>
>>> Lean . Enterprise . Middleware
>>>
>>
>>
>>
>> --
>> Rajith Roshan
>> Software Engineer, WSO2 Inc.
>> Mobile: +94-72-642-8350 <%2B94-71-554-8430>
>>
>
>
>
> --
> Malith Jayasinghe
>
> WSO2, Inc. (http://wso2.com)
> Email   :[email protected]
> Mobile :0770704040
> Blog     :https://medium.com/@malith.jayasinghe
> <https://medium.com/@malith.jayasinghe>
> Lean . Enterprise . Middleware
>



-- 
Rajith Roshan
Software Engineer, WSO2 Inc.
Mobile: +94-72-642-8350 <%2B94-71-554-8430>

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] [C5][APIM] Full Text Search

Reply via email to