Julian, How do you plan to use Gandiva in Apache Calcite?

On Tue, Apr 20, 2021 at 9:57 PM Julian Hyde <jh...@apache.org> wrote:

> We would love to use Gandiva in Apache Calcite [1] but we are blocked
> because the JAR on Maven Central doesn't work on macOS, Linux or
> Windows  [2] and there seems to be no interest in fixing the problem.
> So I doubt whether anyone is using Gandiva in production (unless they
> have built the artifacts for themselves).
>
> Once Gandiva is working for us we will have an opinion about caching.
>
> Julian
>
> [1] https://issues.apache.org/jira/browse/CALCITE-2040
>
> [2] https://issues.apache.org/jira/browse/ARROW-11135
>
> On Tue, Apr 20, 2021 at 2:58 AM Vivekanand Vellanki <vi...@dremio.com>
> wrote:
> >
> > We are considering using an on-disk - this is planned for later. Even
> with
> > an on-disk cache, we still need an eviction policy to ensure that Gandiva
> > doesn't use up the entire disk.
> >
> > For now, we are assuming that we can measure the cost accurately - the
> > assumption is that the query engine would use Gandiva on a thread that is
> > pinned to a core. For other engines, an alternate estimate of cost can be
> > the complexity of the expression.
> >
> > On Tue, Apr 20, 2021 at 2:46 PM Antoine Pitrou <anto...@python.org>
> wrote:
> >
> > >
> > > Hi Projjal,
> > >
> > > The main issue here is to compute the cost accurately (is it
> computation
> > > runtime? memory footprint? can you measure the computation time
> > > accurately, regardless of system noise - e.g. other threads and
> > > processes?).
> > >
> > > Intuitively, if the LRU cache shows too many misses, a simple measure
> is
> > > to increase its size ;-)
> > >
> > > Last question: have you considered a second level on-disk cache?  Numba
> > > uses such a cache with good results:
> > > https://numba.readthedocs.io/en/stable/developer/caching.html
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 20/04/2021 à 06:28, Projjal Chanda a écrit :
> > > > Hi,
> > > > We currently have a cache[1] in gandiva that caches the built
> projector
> > > or filter module with LRU based eviction policy. However since the
> cost of
> > > building different expressions is not uniform it makes sense to have a
> > > different eviction policy that takes into account an associated cost
> of a
> > > cache miss (while also discounting the items which have not been
> recently
> > > used). We are planning to use an algorithm called GreedyDual-Size
> Algorithm
> > > [2] which seems fit for the purpose. The algorithm is quite simple -
> > > > Each item has a cost (build time in our case) and item with lowest
> cost
> > > (c_min) is evicted. All other items cost are deducted by (c_min)
> > > > On cache hit, the item cost is restored to the original value
> > > >
> > > > This can be implemented using a priority queue and an efficient
> > > implementation of this can handle both cache hit and eviction in
> O(logk)
> > > time.
> > > >
> > > > Does anybody have any other suggestions or ideas on this?
> > > >
> > > > [1]
> https://github.com/apache/arrow/blob/master/cpp/src/gandiva/cache.h
> > > <https://github.com/apache/arrow/blob/master/cpp/src/gandiva/cache.h>
> > > > [2]
> > >
> https://www.usenix.org/legacy/publications/library/proceedings/usits97/full_papers/cao/cao_html/node8.html
> > > <
> > >
> https://www.usenix.org/legacy/publications/library/proceedings/usits97/full_papers/cao/cao_html/node8.html
> > > >
> > > >
> > > > Regards,
> > > > Projjal
> > > >
> > >
>

Reply via email to