fgreg commented on a change in pull request #8: SDAP-40 Update Spark Time Series algorithm to handle multiple time steps per tile URL: https://github.com/apache/incubator-sdap-nexus/pull/8#discussion_r174330935
########## File path: analysis/webservice/algorithms_spark/TimeSeriesSpark.py ########## @@ -178,27 +178,10 @@ def calc(self, request, **args): for shortName in ds: the_time = datetime.now() - daysinrange = self._tile_service.find_days_in_range_asc(bounding_polygon.bounds[1], - bounding_polygon.bounds[3], - bounding_polygon.bounds[0], - bounding_polygon.bounds[2], - shortName, - start_seconds_from_epoch, - end_seconds_from_epoch) - self.log.info("Finding days in range took %s for dataset %s" % (str(datetime.now() - the_time), shortName)) - - ndays = len(daysinrange) - if ndays == 0: - raise NoDataException(reason="No data found for selected timeframe") - - self.log.debug('Found {0} days in range'.format(ndays)) - for i, d in enumerate(daysinrange): - self.log.debug('{0}, {1}'.format(i, datetime.utcfromtimestamp(d))) - spark_nparts_needed = min(spark_nparts, ndays) - - the_time = datetime.now() - results, meta = spark_driver(daysinrange, bounding_polygon, shortName, - spark_nparts_needed=spark_nparts_needed, sc=self._sc) + results, meta = spark_driver(start_seconds_from_epoch, Review comment: This is good, but I'm thinking we should run a sanity check query to make sure there are at least some tiles in the search domain before passing off to Spark. Otherwise we incur the cost of spinning up a bunch of spark tasks for no reason. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services