Re: [Dev] Issues when using DAS features in an external Spark (non-OSGi) environment

Gihan Anuruddha Mon, 29 Jun 2015 02:18:08 -0700

Hi Sinthuja,

Yes, by default data purging is disabled. It is a dev ops decision to
enable or disable the purging task and come-up with suitable input like
retention period, purge enabled tables etc.
Regards,
Gihan


On Mon, Jun 29, 2015 at 2:38 PM, Sinthuja Ragendran <[email protected]>
wrote:

> Hi Nirmal,
>
> When the purging disabled, if there is already registered purging task
> then it'll be deleted, and therefore the it's anyhow required to access the
> task service if it's enabled/disabled.
>
> But we can check the existence of task service, and do the analytics
> purging related operation if and only if the task service is registered,
> with this we can resolve the issue irrespective of above configuration. And
> we can log a warn message if the task service is not registered and purging
> task is enabled.
>
> @Gihan: I think by default the purging needs to be enabled for the
> continuous operation with RDBMS datasource, without too many data being
> accumulated in the datasource. Any reason for this to be disabled?
>
> Thanks,
> Sinthuja.
>
> On Mon, Jun 29, 2015 at 2:07 PM, Nirmal Fernando <[email protected]> wrote:
>
>> That worked Sinthuja! Thanks. However, is it possible to disable the Task
>> Service initialization if the purging is disabled (which is the default
>> behaviour)?
>>
>> <analytics-data-purging>
>>
>>       <purging-enable>false</purging-enable>
>>
>>       <purge-node>true</purge-node>
>>
>>       <cron-expression>0 0 0 * * ?</cron-expression>
>>
>>       <purge-include-table-patterns>
>>
>>          <table>.*</table>
>>
>>       </purge-include-table-patterns>
>>
>>       <data-retention-days>365</data-retention-days>
>>
>>    </analytics-data-purging>
>>
>> On Mon, Jun 29, 2015 at 1:57 PM, Sinthuja Ragendran <[email protected]>
>> wrote:
>>
>>> Hi Nirmal,
>>>
>>> Thanks for sharing the necessary details. It's due to the data purging
>>> configuration has been enabled in the analytics-conf.xml which uses the
>>> task internally. can you please try to comment the analytics purging
>>> configuration from the repository/conf/analytics/analytics-conf.xml and see?
>>>
>>> Thanks,
>>> Sinthuja.
>>>
>>> On Mon, Jun 29, 2015 at 1:44 PM, Nirmal Fernando <[email protected]>
>>> wrote:
>>>
>>>> Hi Sinthuja,
>>>>
>>>> Thanks for the explanation. I think I should have used DAL instead of
>>>> DAS. Yes, so what I talking here is about the DAL features. Exact error is
>>>> [1] and reason for this is TaskService being null. Can you please check?
>>>>
>>>> [1]
>>>>
>>>> 15/06/28 11:54:51 INFO MemoryStore: Block broadcast_0 stored as values
>>>> in memory (estimated size 3.4 KB, free 265.1 MB)
>>>>
>>>> 15/06/28 11:55:02 ERROR Executor: Exception in task 0.0 in stage 0.0
>>>> (TID 0)
>>>>
>>>> java.lang.NullPointerException
>>>>
>>>>         at
>>>> org.wso2.carbon.analytics.dataservice.AnalyticsDataServiceImpl.<init>(AnalyticsDataServiceImpl.java:149)
>>>>
>>>>         at
>>>> org.wso2.carbon.analytics.dataservice.AnalyticsServiceHolder.checkAndPopulateCustomAnalyticsDS(AnalyticsServiceHolder.java:79)
>>>>
>>>>         at
>>>> org.wso2.carbon.analytics.dataservice.AnalyticsServiceHolder.getAnalyticsDataService(AnalyticsServiceHolder.java:67)
>>>>
>>>>         at
>>>> org.wso2.carbon.analytics.spark.core.internal.ServiceHolder.getAnalyticsDataService(ServiceHolder.java:73)
>>>>
>>>>         at
>>>> org.wso2.carbon.analytics.spark.core.util.AnalyticsRDD.compute(AnalyticsRDD.java:81)
>>>>
>>>>         at
>>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>>>>
>>>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>>>>
>>>>         at
>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>>
>>>>         at
>>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>>>>
>>>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>>>>
>>>>         at
>>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>>
>>>>         at
>>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>>>>
>>>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>>>>
>>>>         at
>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>>>>
>>>>         at org.apache.spark.scheduler.Task.run(Task.scala:64)
>>>>
>>>>         at
>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>>>>
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>
>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>
>>>> On Mon, Jun 29, 2015 at 12:16 PM, Sinthuja Ragendran <[email protected]
>>>> > wrote:
>>>>
>>>>> Hi nirmal,
>>>>>
>>>>> DAS features such as scripts scheduling, purging,etc are used to
>>>>> submit the jobs (only spark queries) to external spark cluster, rather
>>>>> those DAS features jars doesn't need to exists within the external spark
>>>>> cluster instance. For example, if we consider spark script scheduled
>>>>> execution scenario which uses Task OSGI services,  and the task triggering
>>>>> will be be occurred wihing DAS node (OSGI env), furthermore when the spark
>>>>> is configured externally the job will be handed over to the external
>>>>> cluster, and then results will be given back to DAS node. Therefore I 
>>>>> don't
>>>>> think any of the DAS features jars other than DAL feature jars will be
>>>>> required to be inside the external spark cluster.
>>>>>
>>>>> Can you please explain more on what is your usecase? And how you have
>>>>> configured the setup with DAS features?
>>>>>
>>>>> Thanks,
>>>>> Sinthuja.
>>>>>
>>>>>
>>>>> On Sunday, June 28, 2015, Nirmal Fernando <[email protected]> wrote:
>>>>>
>>>>>> Hi DAS team,
>>>>>>
>>>>>> It appears that we have to think and implement DAS features so that
>>>>>> they will run even in an non-OSGi environment like an external Spark
>>>>>> scenario. We have some DAS features which are dependent on Task Service
>>>>>> etc. and they are failing when we use the from within a Spark job which
>>>>>> runs on an external Spark cluster.
>>>>>>
>>>>>> How can we solve this?
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Thanks & regards,
>>>>>> Nirmal
>>>>>>
>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>> Mobile: +94715779733
>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Thanks & regards,
>>>> Nirmal
>>>>
>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>> Mobile: +94715779733
>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *Sinthuja Rajendran*
>>> Associate Technical Lead
>>> WSO2, Inc.:http://wso2.com
>>>
>>> Blog: http://sinthu-rajan.blogspot.com/
>>> Mobile: +94774273955
>>>
>>>
>>>
>>
>>
>> --
>>
>> Thanks & regards,
>> Nirmal
>>
>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>> Mobile: +94715779733
>> Blog: http://nirmalfdo.blogspot.com/
>>
>>
>>
>
>
> --
> *Sinthuja Rajendran*
> Associate Technical Lead
> WSO2, Inc.:http://wso2.com
>
> Blog: http://sinthu-rajan.blogspot.com/
> Mobile: +94774273955
>
>
>


-- 
W.G. Gihan Anuruddha
Senior Software Engineer | WSO2, Inc.
M: +94772272595

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] Issues when using DAS features in an external Spark (non-OSGi) environment

Reply via email to