Re: [PR] PARQUET-2437: Avoid flushing at Parquet writes after an exception [parquet-mr]

via GitHub Mon, 04 Mar 2024 00:09:33 -0800


gszadovszky commented on code in PR #1285:
URL: https://github.com/apache/parquet-mr/pull/1285#discussion_r1510731818



##########
parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestParquetWriterError.java:
##########
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.hadoop;
+
+import java.io.IOException;
+import java.lang.reflect.Field;
+import java.nio.ByteBuffer;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Random;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.bytes.DirectByteBufferAllocator;
+import org.apache.parquet.bytes.TrackingByteBufferAllocator;
+import org.apache.parquet.column.ParquetProperties;
+import org.apache.parquet.example.data.Group;
+import org.apache.parquet.filter2.recordlevel.PhoneBookWriter;
+import org.apache.parquet.hadoop.codec.CleanUtil;
+import org.apache.parquet.hadoop.example.ExampleParquetWriter;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+import org.apache.parquet.io.LocalOutputFile;
+import org.junit.Assert;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+/**
+ * Unit test to check how Parquet writing behaves in case of an error happens 
during the writes. We use an OOM because
+ * that is the most tricky to handle. In this case we shall avoid flushing 
since it may cause writing to already
+ * released memory spaces.
+ * <p>
+ * To catch the potential issue of writing into released ByteBuffer objects, 
direct memory allocation is used and at the
+ * release() call we actually release the related direct memory and zero the 
address inside the ByteBuffer object. As a
+ * result, a subsequent read/write call on the related ByteBuffer object will 
crash the whole jvm. (Unfortunately, there
+ * is no better way to test this.) To avoid crashing the test executor jvm, 
the code of this test is executed in a
+ * separate process.
+ */
+public class TestParquetWriterError {
+
+  @Rule
+  public TemporaryFolder tmpFolder = new TemporaryFolder();
+
+  @Test
+  public void testInSeparateProcess() throws IOException, InterruptedException 
{
+    String outputFile = tmpFolder.newFile("out.parquet").toString();
+
+    String classpath = System.getProperty("java.class.path");
+    String javaPath = Paths.get(System.getProperty("java.home"), "bin", "java")
+        .toAbsolutePath()
+        .toString();
+    Process process = new ProcessBuilder()
+        .command(javaPath, "-cp", classpath, Main.class.getName(), outputFile)
+        .redirectError(ProcessBuilder.Redirect.INHERIT)
+        .redirectOutput(ProcessBuilder.Redirect.INHERIT)
+        .start();
+    Assert.assertEquals(
+        "Test process exited with a non-zero return code. See previous logs 
for details.",
+        0,
+        process.waitFor());
+  }
+
+  /**
+   * The class to be used to execute this test in a separate thread.
+   */
+  public static class Main {
+
+    private static final Random RANDOM = new Random(2024_02_27_14_20L);

Review Comment:
   This is pseudo random with a fixed seed. It means we use the very same data 
and scenario. 100% reproducible. I am using random because it is simpler and 
shorter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] PARQUET-2437: Avoid flushing at Parquet writes after an exception [parquet-mr]

Reply via email to