alamb commented on code in PR #99:
URL: https://github.com/apache/datafusion-site/pull/99#discussion_r2269720012


##########
content/blog/2025-08-15-external-parquet-indexes.md:
##########
@@ -0,0 +1,772 @@
+---
+layout: post
+title: Using External Indexes, Metadata Stores, Catalogs and Caches to 
Accelerate Queries on Apache Parquet
+date: 2025-08-15
+author: Andrew Lamb (InfluxData)
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+It is a common misconception that [Apache Parquet] requires (slow) reparsing of
+metadata and is limited to indexing structures provided by the format. In fact,
+caching parsed metadata and using custom external indexes along with
+Parquet's hierarchical data organization can significantly speed up query
+processing.
+
+In this blog, I describe the role of external indexes, caches, and metadata
+stores in high performance systems, and demonstrate how to apply these concepts
+to Parquet processing using [Apache DataFusion]. *Note this is an expanded
+version of the [companion video] and [presentation].*
+
+# Motivation

Review Comment:
   Fixed in 3bf2165



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to