[
https://issues.apache.org/jira/browse/DRILL-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18065425#comment-18065425
]
ASF GitHub Bot commented on DRILL-8529:
---------------------------------------
TomIsFrank commented on PR #3023:
URL: https://github.com/apache/drill/pull/3023#issuecomment-4048030988
> > > Thank you very much for this contribution. I have some minor nits, and
some questions:
> > >
> > > 1. Is there ever a situation where the underlying data would change
and it would affect the query plan such that Drill isn't returning the correct
data?
> > > 2. If possible, I'd really like to see a flag added to the metadata
that is returned which would indicate that the query plan was from cache.
> > > 3. Do you anticipate any security issues from using cached query
plans? For instance, let's say that we have user translation enabled and user
1 executes a query against a MySQL database. User 2 then tries the same query,
but does not have the same access. Would user 2 be able to access the data?
> >
> >
> > Hi, I did some research and please correct me if I'm wrong as I'm quite
new to the project:
> > Problems:
> >
> > * Cache age. as Drill has its own metadata cache life, if we cache with
caffeine the metadata cache might expire before the query plan cache does
> >
> > Not a problem:
> >
> > * RelNode will change when a schema changes resulting in a miss on the
cache
> > * RelNode should update when there are any changes in the metadata
> >
> >
> > 2. Running a query creates a json profile. We can add a boolean flag
there called query_plan_from_cache. We will probably have to add this in the
QueryContext or QueryInfo.
> >
> > An admin user can do a query, the query plan is cached then a normal
user does the same query, he uses the query plan of the admin.
> > To find out what happens, we’d need to do a debug session.
> > Possible outcomes:
> >
> > * The query fails, the normal user won’t be able to use that query
(blocking)
> > * The query executes with the admin plan (unsecure)
> >
> > In either case we’d need a fix.
> > Possible solution:
> >
> > * We’ll probably have to add user info to the cache key.
>
> Thanks for your response. Regarding the security concern, could you try
the following:
>
> 1. Connect Drill to a JDBC database such as MySQL
> 2. Enable user translation and set up 2 different users. 1 user should
have access to the database, the other should not.
> 3. Run a query with the user that has access and verify that the plan was
cached.
> 4. Run the same query with the other user (who should not have access) and
see what happens.
>
> Also, do you know whether credentials are stored in the query plans? If so
we're going to have to think about that.
I'm sorry for my late response. We haven't had the opportunity yet to test
this. We are currently on a tight budget so I don't when I'll be able to test
this.
If anyone in the community would be able to do this test, that'd be much
appreciated, otherwise this'll have to wait until we get new budget approved.
I'm sorry for the inconvenience.
> Caching QueryPlan Results
> -------------------------
>
> Key: DRILL-8529
> URL: https://issues.apache.org/jira/browse/DRILL-8529
> Project: Apache Drill
> Issue Type: Improvement
> Components: Query Planning & Optimization
> Reporter: Vincent de Gans
> Priority: Minor
> Fix For: Future
>
>
> I propose introducing a caching mechanism for the output of `getQueryPlan()`
> in cases where:
> - The input SQL query is the same as a previously seen one
> - The schema or relevant metadata used for planning has not changed
> - The cached result has not expired, based on a configurable time-to-live
> (TTL)
> Proposed Caching Features
> - Toggle to enable or disable query plan caching
> - Configurable TTL-based invalidation
> - Optional schema metadata verification to detect changes in underlying data
> sources
> Motivation
> - Reduce planner overhead for repeated queries
> - Improve response times in environments where query plans are reused
> - Provide an optional optimization that can be enabled when beneficial
--
This message was sent by Atlassian Jira
(v8.20.10#820010)