[ 
https://issues.apache.org/jira/browse/ORC-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619139#comment-17619139
 ] 

noirello commented on ORC-1288:
-------------------------------

Sorry guys, I was mistaken in my previous comment. After I adjusted the test 
assertions to be compatible with 1.8.0, it's clear to see that the Ubuntu (GCC 
9.4.0) runners all failed with memory corruption errors on the [CI 
servers|https://noirello.visualstudio.com/pyorc/_build/results?buildId=548&view=logs&j=cfe869c6-516e-58cd-4414-f2c0f770cbbb&t=5943fcad-88a9-57e4-7168-845780350386&l=146]
 as well. Mac seems unaffected. Windows runners are timed out, which not 
happened before. It could be a sign that it's also affected.

I also made a 
[commit|https://noirello.visualstudio.com/pyorc/_build/results?buildId=549&view=results]
 where I changed back the ORC lib to 1.7.6 and nothing else, to see there's no 
memory error with that one. (Don't let the failed states fool you, it seems 
that the same number of records triggers more stripes with 1.7.6 than 1.8.0)

> [C++] Invalid memory freeing with ZLIB compression
> --------------------------------------------------
>
>                 Key: ORC-1288
>                 URL: https://issues.apache.org/jira/browse/ORC-1288
>             Project: ORC
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: noirello
>            Priority: Major
>
> The simple example code ends with a segfault/munmap_chunk(): invalid pointer:
> {code:cpp}
> #include "orc/Common.hh"
> #include "orc/OrcFile.hh"
> using namespace orc;
> int main(void) {
>     ORC_UNIQUE_PTR<OutputStream> outStream = writeLocalFile("test_file.orc");
>     ORC_UNIQUE_PTR<Type> schema(Type::buildTypeFromString("struct<c0:int>"));
>     WriterOptions options;
>     options.setCompression(orc::CompressionKind_ZLIB);
>     options.setStripeSize(4096);
>     options.setCompressionBlockSize(4096);
>     ORC_UNIQUE_PTR<Writer> writer = createWriter(*schema, outStream.get(), 
> options);
>     uint64_t batchSize = 65535, rowCount = 10000000;
>     ORC_UNIQUE_PTR<ColumnVectorBatch> batch = 
> writer->createRowBatch(batchSize);
>     StructVectorBatch *root = dynamic_cast<StructVectorBatch *>(batch.get());
>     LongVectorBatch *c0 = dynamic_cast<LongVectorBatch *>(root->fields[0]);
>     uint64_t rows = 0;
>     
>     for (uint64_t i = 0; i < rowCount; ++i) {
>         c0->data[rows] = i;
>         rows++;
>         if (rows == batchSize) {
>             root->numElements = rows;
>             c0->numElements = rows;
>             writer->add(*batch);
>             rows = 0;
>         }
>     }
>     if (rows != 0) {
>         root->numElements = rows;
>         c0->numElements = rows;
>         writer->add(*batch);
>         rows = 0;
>     }
>     writer->close();
>     return 0;
> }
> {code}
> The bug depends on the stripe size, compression size, and the record number 
> written to the file as well. I wasn't able to reproduce the error with other 
> compression strategies than ZLIB.
> It looks like to me that it's related to 
> [ORC-1130|https://issues.apache.org/jira/projects/ORC/issues/ORC-1130] 
> somehow, but I couldn't comprehend how. (Reverting that modification on the 
> main branch solved the issue).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to