Repository: parquet-cpp Updated Branches: refs/heads/master 0c0c3ca8c -> 0e195730d
PARQUET-799: Fix bug in MemoryMapSource::CloseFile The key change here is to ensure that munmap is called at most once for a given memory-mapped file. Previously, MemoryMapSource::CloseFile was calling munmap on every invocation (provided that a file had ever been successfully mapped into memory for that instance). This is problematic in a multi-threaded environment, even if each MemoryMapSource instance is being used in only one thread, as illustrated by the following hypothetical sequence of operations: thread 1 @ time 1: munmap(0xf000, 4096) thread 2 @ time 2: void *addr = mmap(NULL, 4096, ...) // addr <- 0xf000 thread 1 @ time 3: munmap(0xf000, 4096) After time 3, the mapping for the memory segment beginning at 0xf000 has been invalidated, so the next attempt by thread 2 to access memory within that segment will likely cause a segfault (unless yet another thread has mmap'd that segment in the meantime, in which case the results could be even more interesting, but certainly no better). Also, I'm adding/modifying a couple comments in header files to mark "sample" implementations accordingly. This is intended to give API consumers a heads up as to the intent and level of maturity of those sections of the codebase. Author: William Forson <[email protected]> Closes #203 from wdforson/master and squashes the following commits: 782713a [William Forson] Adjust 'sample' code comments 7337551 [William Forson] PARQUET-799: Fix bug in MemoryMapSource::CloseFile Project: http://git-wip-us.apache.org/repos/asf/parquet-cpp/repo Commit: http://git-wip-us.apache.org/repos/asf/parquet-cpp/commit/0e195730 Tree: http://git-wip-us.apache.org/repos/asf/parquet-cpp/tree/0e195730 Diff: http://git-wip-us.apache.org/repos/asf/parquet-cpp/diff/0e195730 Branch: refs/heads/master Commit: 0e195730d413661d99a75d1d285f2e8593909e8c Parents: 0c0c3ca Author: William Forson <[email protected]> Authored: Mon Dec 19 18:28:07 2016 -0500 Committer: Wes McKinney <[email protected]> Committed: Mon Dec 19 18:28:07 2016 -0500 ---------------------------------------------------------------------- src/parquet/file/reader.h | 4 +++- src/parquet/util/input.cc | 5 ++++- src/parquet/util/input.h | 4 ++++ 3 files changed, 11 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/parquet-cpp/blob/0e195730/src/parquet/file/reader.h ---------------------------------------------------------------------- diff --git a/src/parquet/file/reader.h b/src/parquet/file/reader.h index 89dbbc8..ca28f67 100644 --- a/src/parquet/file/reader.h +++ b/src/parquet/file/reader.h @@ -79,7 +79,9 @@ class PARQUET_EXPORT ParquetFileReader { ParquetFileReader(); ~ParquetFileReader(); - // API Convenience to open a serialized Parquet file on disk + // API Convenience to open a serialized Parquet file on disk, using built-in IO + // interface implementations that were created for testing, and may not be robust for + // all use cases. static std::unique_ptr<ParquetFileReader> OpenFile(const std::string& path, bool memory_map = true, ReaderProperties props = default_reader_properties()); http://git-wip-us.apache.org/repos/asf/parquet-cpp/blob/0e195730/src/parquet/util/input.cc ---------------------------------------------------------------------- diff --git a/src/parquet/util/input.cc b/src/parquet/util/input.cc index 10e1573..127b90c 100644 --- a/src/parquet/util/input.cc +++ b/src/parquet/util/input.cc @@ -134,7 +134,10 @@ void MemoryMapSource::Close() { } void MemoryMapSource::CloseFile() { - if (data_ != nullptr) { munmap(data_, size_); } + if (data_ != nullptr) { + munmap(data_, size_); + data_ = nullptr; + } LocalFileSource::CloseFile(); } http://git-wip-us.apache.org/repos/asf/parquet-cpp/blob/0e195730/src/parquet/util/input.h ---------------------------------------------------------------------- diff --git a/src/parquet/util/input.h b/src/parquet/util/input.h index d6d6f34..1bb41e3 100644 --- a/src/parquet/util/input.h +++ b/src/parquet/util/input.h @@ -55,6 +55,10 @@ class PARQUET_EXPORT RandomAccessSource { int64_t size_; }; +// ---------------------------------------------------------------------- +// Implementations of RandomAccessSource used for testing and internal CLI tools. +// May not be sufficiently robust for general production use. + class PARQUET_EXPORT LocalFileSource : public RandomAccessSource { public: explicit LocalFileSource(MemoryAllocator* allocator = default_allocator())
