[GitHub] fgreg commented on a change in pull request #8: SDAP-40 Update Spark Time Series algorithm to handle multiple time steps per tile

GitBox Tue, 13 Mar 2018 18:19:47 -0700

fgreg commented on a change in pull request #8: SDAP-40 Update Spark Time 
Series algorithm to handle multiple time steps per tile
URL: https://github.com/apache/incubator-sdap-nexus/pull/8#discussion_r174330935


 ##########
 File path: analysis/webservice/algorithms_spark/TimeSeriesSpark.py
 ##########
 @@ -178,27 +178,10 @@ def calc(self, request, **args):
         for shortName in ds:
 
             the_time = datetime.now()
-            daysinrange = 
self._tile_service.find_days_in_range_asc(bounding_polygon.bounds[1],
-                                                                    
bounding_polygon.bounds[3],
-                                                                    
bounding_polygon.bounds[0],
-                                                                    
bounding_polygon.bounds[2],
-                                                                    shortName,
-                                                                    
start_seconds_from_epoch,
-                                                                    
end_seconds_from_epoch)
-            self.log.info("Finding days in range took %s for dataset %s" % 
(str(datetime.now() - the_time), shortName))
-
-            ndays = len(daysinrange)
-            if ndays == 0:
-                raise NoDataException(reason="No data found for selected 
timeframe")
-
-            self.log.debug('Found {0} days in range'.format(ndays))
-            for i, d in enumerate(daysinrange):
-                self.log.debug('{0}, {1}'.format(i, 
datetime.utcfromtimestamp(d)))
-            spark_nparts_needed = min(spark_nparts, ndays)
-
-            the_time = datetime.now()
-            results, meta = spark_driver(daysinrange, bounding_polygon, 
shortName,
-                                         
spark_nparts_needed=spark_nparts_needed, sc=self._sc)
+            results, meta = spark_driver(start_seconds_from_epoch,
 
 Review comment:
   This is good, but I'm thinking we should run a sanity check query to make 
sure there are at least some tiles in the search domain before passing off to 
Spark. Otherwise we incur the cost of spinning up a bunch of spark tasks for no 
reason.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] fgreg commented on a change in pull request #8: SDAP-40 Update Spark Time Series algorithm to handle multiple time steps per tile

Reply via email to