[jira] [Commented] (SDAP-40) Update Spark Time Series algorithm to handle multiple time steps per tile

ASF GitHub Bot (JIRA) Tue, 13 Mar 2018 18:20:32 -0700

    [ 
https://issues.apache.org/jira/browse/SDAP-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397947#comment-16397947
 ]


ASF GitHub Bot commented on SDAP-40:
------------------------------------

fgreg commented on a change in pull request #8: SDAP-40 Update Spark Time 
Series algorithm to handle multiple time steps per tile
URL: https://github.com/apache/incubator-sdap-nexus/pull/8#discussion_r174330935
 
 

 ##########
 File path: analysis/webservice/algorithms_spark/TimeSeriesSpark.py
 ##########
 @@ -178,27 +178,10 @@ def calc(self, request, **args):
         for shortName in ds:
 
             the_time = datetime.now()
-            daysinrange = 
self._tile_service.find_days_in_range_asc(bounding_polygon.bounds[1],
-                                                                    
bounding_polygon.bounds[3],
-                                                                    
bounding_polygon.bounds[0],
-                                                                    
bounding_polygon.bounds[2],
-                                                                    shortName,
-                                                                    
start_seconds_from_epoch,
-                                                                    
end_seconds_from_epoch)
-            self.log.info("Finding days in range took %s for dataset %s" % 
(str(datetime.now() - the_time), shortName))
-
-            ndays = len(daysinrange)
-            if ndays == 0:
-                raise NoDataException(reason="No data found for selected 
timeframe")
-
-            self.log.debug('Found {0} days in range'.format(ndays))
-            for i, d in enumerate(daysinrange):
-                self.log.debug('{0}, {1}'.format(i, 
datetime.utcfromtimestamp(d)))
-            spark_nparts_needed = min(spark_nparts, ndays)
-
-            the_time = datetime.now()
-            results, meta = spark_driver(daysinrange, bounding_polygon, 
shortName,
-                                         
spark_nparts_needed=spark_nparts_needed, sc=self._sc)
+            results, meta = spark_driver(start_seconds_from_epoch,
 
 Review comment:
   This is good, but I'm thinking we should run a sanity check query to make 
sure there are at least some tiles in the search domain before passing off to 
Spark. Otherwise we incur the cost of spinning up a bunch of spark tasks for no 
reason.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Spark Time Series algorithm to handle multiple time steps per tile
> -------------------------------------------------------------------------
>
>                 Key: SDAP-40
>                 URL: https://issues.apache.org/jira/browse/SDAP-40
>             Project: Apache Science Data Analytics Platform
>          Issue Type: Improvement
>            Reporter: Joseph Jacob
>            Assignee: Joseph Jacob
>            Priority: Major
>
> To date, NEXUS has partitioned data to have 1 time step per tile.  For SWOT 
> we want to have multiple time steps per tile for efficiency of the 
> computations.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (SDAP-40) Update Spark Time Series algorithm to handle multiple time steps per tile

Reply via email to