Re: [Dev] How to do secondary indexing with bam-data-publisher for Stratos-Logging

Srinath Perera Wed, 04 Jul 2012 00:01:17 -0700

Hi All,

Sorry for late reply


As I see it, secondary indexes are not a part of the agent API, which is
about collecting the data.

Secondary index is about analyzing / presenting the data. So if user
need secondary indexes, they will have to go to Cassandra directly, and
define them. I think we can do the same in logging impl.

--Srinath

On Fri, Jun 22, 2012 at 12:34 AM, Suhothayan Sriskandarajah
<[email protected]>wrote:

>
>
> On Fri, Jun 22, 2012 at 2:36 PM, Amani Soysa <[email protected]> wrote:
>
>>
>>
>> On Fri, Jun 22, 2012 at 2:29 PM, Tharindu Mathew <[email protected]>wrote:
>>
>>> This can be a useful feature for realtime requirements. But we need to
>>> follow some proper convention as this changes the event stream definition
>>> This is then an extensive change, and we are close to feature freezing.
>>>
>>> On Fri, Jun 22, 2012 at 2:15 PM, Amani Soysa <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Jun 22, 2012 at 2:02 PM, Deependra Ariyadewa <[email protected]>wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 22, 2012 at 1:45 PM, Amani Soysa <[email protected]> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 22, 2012 at 1:24 PM, Tharindu Mathew 
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>> This will be useful for folks who want real time data access, but
>>>>>>> BAM is not designed to be real time. I don't want the Agent API to be
>>>>>>> specific to Cassandra, either.
>>>>>>>
>>>>>>> There should be a clean way to do this. How did you decide to do it
>>>>>>> this way? Was there a discussion?
>>>>>>>
>>>>>> Yes there was a discussion on this some time back on Architecture
>>>>>> -"RFC: Architecture for Stratos Log Processing"  where we decided to push
>>>>>> logs to bam event receiver through the publisher and view logs using 
>>>>>> hector
>>>>>> api.
>>>>>>
>>>>>
>>>>> Initially we tried to use flume as the Stratos log collector/manager
>>>>> but we stop flume evaluation because BAM's capability to cover the same 
>>>>> use
>>>>> case.
>>>>>
>>>>>  There are several workarounds like create relevant keyspaces in the
>>>>> tenant creation or create extended event receiver only for logging.
>>>>>
>>>> If we do it in the tenant creation time their will be around 10
>>>> keyspaces per each tenant (given that for each server, we create a keyspace
>>>> as I explained earlier) So even a user doesn't use a particular server
>>>> their will be a keyspace for it. So if there are 1000 tenants there will be
>>>> 10 000 keyspace (even if some keyspaces are not used at all) So I think its
>>>> better to create keyspaces when ever logs are publishing to that particular
>>>> keyspace.
>>>>
>>> 10 keryspaces per tenant?? Are you sure that's right...
>>>
>> Yes for each sever (maybe more than 10 :) )
>> ie - data services,appserver,esb,mb,cep,bps,brs etc all the products we
>> offer for stratos deployment , because we store logs for each server in
>> different keyspace(if not we have to keep everything in a single keyspace
>> per tenant and it will be an expensive search when we filter the logs from
>> the server level because we give users server specific logs)
>> In our earlier syslog implementation we divided logs per each server for
>> fast result (as logs generates). Thats why its very important to not to
>> create keyspaces if users are not using a particular product.
>>
>
> I also agree with tharindu, that we should not make stream definition
> Cassandra specific, since this can also be used for JDBC data-store or
> InMemory data-store for CEP.
>
> Since this is an Cassandra specific issue and since we know the stream
> definition in advance, I believe its appropriate to have a Cassandra data
> store configuration which maps the stream definition to the appropriate
> Cassandra create key store query. Through this when the client request sent
> for defineEventStream the Cassandra data-store can first checks if it match
> one of the entry in the configuration, if so it runs the create key store
> query given in the configuration to create the key store with indexes, else
> it will create the key store in the normal way.
> Through this we can also restrict the number of unnecessary key store
> creation.
>
> Regards
> Suho
>
>>
>>>>> Thanks,
>>>>>
>>>>> Deependra.
>>>>>
>>>>>>
>>>>>>> On Fri, Jun 22, 2012 at 8:45 AM, Amani Soysa <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Currently we are sending LogEvent data through bam data publisher
>>>>>>>> to bam event receiver using a custom log4j appender. And we retrieve 
>>>>>>>> logs
>>>>>>>> using the hector API for the carbon log viewer. However, we need to 
>>>>>>>> have
>>>>>>>> secondary indexes for several columns so that we can filter log 
>>>>>>>> information
>>>>>>>> for a given column ( such as date, applicationName, priority,logger 
>>>>>>>> etc)
>>>>>>>> when creating the data publisher (keyspace). From the current Bam Data
>>>>>>>> publisher implementation we cannot do secondary indexing all we can do 
>>>>>>>> is
>>>>>>>> define the column name and the data type of that column, and reciver
>>>>>>>> creates the keyspaces for given columns with their data types.
>>>>>>>>
>>>>>>>>  streamId = dataPublisher.defineEventStream("{" + "
>>>>>>>> 'name':'org.wso2.carbon.logging.$tenantId.$serverName',"
>>>>>>>>                        + "  'version':'1.0.0'," + "  'nickName':
>>>>>>>> 'Logs',"
>>>>>>>>                        + "  'description': 'Logging Event'," + "
>>>>>>>>  'metaData':["
>>>>>>>>                        + "   {'name':'clientType','type':'STRING'}"
>>>>>>>> + "  ],"
>>>>>>>>                        + "  'payloadData':["
>>>>>>>>                        + "          {'name':'tenantID','type':'
>>>>>>>> STRING'},"
>>>>>>>>                        + "          {'name':'serverName','type':'
>>>>>>>> STRING'},"
>>>>>>>>                        + "          {'name':'appName','type':'
>>>>>>>> STRING'},"
>>>>>>>>                        + "          {'name':'logTime','type':'
>>>>>>>> LONG'},"
>>>>>>>>                        + "          {'name':'logger','type':'
>>>>>>>> STRING'},"
>>>>>>>>                        + "          {'name':'priority','type':'
>>>>>>>> STRING'},"
>>>>>>>>                        + "          {'name':'message','type':'
>>>>>>>> STRING'},"
>>>>>>>>                        + "          {'name':'ip','type':'STRING'},"
>>>>>>>>                        + "          {'name':'stacktrace','type':'
>>>>>>>> STRING'},"
>>>>>>>>                         + "          {'name':'instance','type':'
>>>>>>>> STRING'}"
>>>>>>>>                        + "  ]"
>>>>>>>>                        + "}");
>>>>>>>>
>>>>>>>> Is it possible to have a cassandra specific event receiver (for
>>>>>>>> logging purposes) so that we can create key spaces with secondary 
>>>>>>>> indexes?[
>>>>>>>> 1 <https://wso2.org/jira/browse/CARBON-13468>] and it will create
>>>>>>>> keyspaces when ever logs are published . Or do we need to create 
>>>>>>>> keyspaces
>>>>>>>> at tenant creation time?. For a given tenant we need to create several
>>>>>>>> keyspaces, depending on the server (and if possible for applications as
>>>>>>>> well so we can have better performance when viewing logs).
>>>>>>>> ie - keyspace1 - org_wso2_logging_tenant1_application_server (store
>>>>>>>> AS specific logs)
>>>>>>>>      keyspace2 -
>>>>>>>> org_wso2_logging_tenant1_data_services_server  (store DSS specific 
>>>>>>>> logs)
>>>>>>>>
>>>>>>>> Please note that we cannot use BAM analytics  to view logs because
>>>>>>>> we need a real time log-viwer.
>>>>>>>>
>>>>>>>> [1] - https://wso2.org/jira/browse/CARBON-13468
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Amani
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>>
>>>>>>> Tharindu
>>>>>>>
>>>>>>> blog: http://mackiemathew.com/
>>>>>>> M: +94777759908
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Deependra Ariyadewa
>>>>> WSO2, Inc. http://wso2.com/ http://wso2.org
>>>>>
>>>>> email [email protected]; cell +94 71 403 5996 ;
>>>>> Blog http://risenfall.wordpress.com/
>>>>> PGP info: KeyID: 'DC627E6F'
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Tharindu
>>>
>>> blog: http://mackiemathew.com/
>>> M: +94777759908
>>>
>>>
>>
>
>
> --
> *S. Suhothayan
> *
> Software Engineer,
> Data Technologies Team,
>  *WSO2, Inc. **http://wso2.com
>  <http://wso2.com/>*
> *lean.enterprise.middleware.*
>
> *email: **[email protected]* <[email protected]>* cell: (+94) 779 756 757
> blog: **http://suhothayan.blogspot.com/* <http://suhothayan.blogspot.com/>
> *
> twitter: **http://twitter.com/suhothayan* <http://twitter.com/suhothayan>*
> linked-in: **http://lk.linkedin.com/in/suhothayan*
> *
> *
>
>


-- 
============================
Srinath Perera, Ph.D.
  Senior Software Architect, WSO2 Inc.
  Visiting Faculty, University of Moratuwa
  Member, Apache Software Foundation
  Research Scientist, Lanka Software Foundation
  Blog: http://srinathsview.blogspot.com/
  Photos: http://www.flickr.com/photos/hemapani/
 Phone: 0772360902

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] How to do secondary indexing with bam-data-publisher for Stratos-Logging

Reply via email to