neoremind commented on pull request #1996:
URL: https://github.com/apache/calcite/pull/1996#issuecomment-652330139


   @XuQianJin-Stars I have addressed the comments above.
   
   For the question: *What is the production usage scenario of this MySQL 
InnoDB Java Reader?*
   Calcite's InnoDB adapter allows user to query the InnoDB data based on ibd 
files directly as illustrated below. `innodb-java-reader` provides some basic 
storage APIs, for example, `queryByPrimaryKey`, `queryAll` and `queryBySk`, 
etc. With this adapter, user is able to query through SQL, and the internal 
embedded execution engine can work pretty well under most cases.
   
   ```
                         SQL Query 
                          |     |
                         /       \
                   ------         ------
                  |                     | 
                  v                     v
         MySQL Server                    
   +-----------------------+   +----------------------+        
   | SQL/Parser/Optimizer..|   |Calcite innodb adapter|   
   +-----------------------+   +----------------------+
   +-----------------------+    +--------------------+         
   | InnoDB Storage Engine |    | innodb-java-reader |
   +-----------------------+    +--------------------+
   
   ---------------------File System--------------------
   
        +------------+      +-----+
        | .ibd files | ...  |     |    InnoDB Data files
        +------------+      +-----+                         
   
   ```
   
   The by-pass querying capability can benefit the following scenarios:
   1. Query by offloading mysql-server process, even without mysql-server 
running. Backup MySQL without disturbing mysql-server process, this is a good 
alternative to dump data besides mysql-dump and `mysql -e` command.
   2. InnoDB data file can be distributed to other storage. Based on this 
adapter, users can transform as what ever they wanted, for example, 
transforming data to columnar format (ORC, Parquet) for OLAP engines; dumping 
data to heterogeneous NewSQL systems, so that Data Lake analytics can leverage 
this. Although real-time is hard to achieve because we only get a snapshot of 
the table, it is still useful in Lambda architecture.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to