[
https://issues.apache.org/jira/browse/AVRO-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Oscar Zhang updated AVRO-3194:
------------------------------
Description:
Hello,
I wonder what's the correct example of parsing an avro input stream into a
vector of `GenericRecord`?
I am trying to parse .avro file using `GenericDatum` to store them in a vector
of `GenericRecord`, but I got a segmentation fault (signal 11) when I am trying
to call `datum.value()`. I am following the example here:
[https://stackoverflow.com/questions/55956222/how-to-read-data-from-avro-file-using-c-interface]
.Here is the sample code that I am writing:
{code:java}
std::unique_ptr<avro::InputStream> avroInputStream =
avro::istreamInputStream(retrievedFile); // `retrievedFile` is a basic_iostream
from AWS S3
// get the schema file
std::stringstream schemaInput(schemaName);
avro::ValidSchema validSchema;
avro::compileJsonSchema(schemaInput, validSchema);
// read the data input stream with the given valid schema
avro::DataFileReader<avro::GenericDatum> fileReader(move(avroInputStream));
avro::GenericDatum datum(fileReader.dataSchema());
std::vector<avro::GenericRecord> recordArray;
while (fileReader.read(datum)) {
if (datum.type() == avro::AVRO_RECORD) {
std::cout << "[Check 1]" << std::endl;
const avro::GenericRecord record = datum.value<avro::GenericRecord>();
// result in segmentation fault
std::cout << "[Check 2]" << std::endl;
recordArray.push_back(record);
}
}
// processing the recordArray further
...{code}
Interestingly, if instead I used the struct generated by the JSON schema to
parse the avro file instead of treating everything generic, it worked just fine
(I'm following the example here:
[https://avro.apache.org/docs/current/api/cpp/html/index.html#UsingAvroDataFiles]
)
was:
Hello,
I wonder what's the correct example of parsing an avro input stream into a
vector of `GenericRecord`?
I am trying to parse .avro file using `GenericDatum` to store them in a vector
of `GenericRecord`, but I got a segmentation fault (signal 11) when I am trying
to call `datum.value()`. Here is the sample code that I am writing:
{code:java}
std::unique_ptr<avro::InputStream> avroInputStream =
avro::istreamInputStream(retrievedFile); // `retrievedFile` is a basic_iostream
from AWS S3
// get the schema file
std::stringstream schemaInput(schemaName);
avro::ValidSchema validSchema;
avro::compileJsonSchema(schemaInput, validSchema);
// read the data input stream with the given valid schema
avro::DataFileReader<avro::GenericDatum> fileReader(move(avroInputStream));
avro::GenericDatum datum(fileReader.dataSchema());
std::vector<avro::GenericRecord> recordArray;
while (fileReader.read(datum)) {
if (datum.type() == avro::AVRO_RECORD) {
std::cout << "[Check 1]" << std::endl;
const avro::GenericRecord record = datum.value<avro::GenericRecord>();
// result in segmentation fault
std::cout << "[Check 2]" << std::endl;
recordArray.push_back(record);
}
}
// processing the recordArray further
...{code}
> [C++] Parsing avro file using `GenericDatum` results in segmentation fault
> --------------------------------------------------------------------------
>
> Key: AVRO-3194
> URL: https://issues.apache.org/jira/browse/AVRO-3194
> Project: Apache Avro
> Issue Type: Bug
> Components: c++
> Environment: Ubuntu on AWS EC2
> Reporter: Oscar Zhang
> Priority: Major
> Attachments: lineorder_d.json
>
>
> Hello,
> I wonder what's the correct example of parsing an avro input stream into a
> vector of `GenericRecord`?
> I am trying to parse .avro file using `GenericDatum` to store them in a
> vector of `GenericRecord`, but I got a segmentation fault (signal 11) when I
> am trying to call `datum.value()`. I am following the example here:
> [https://stackoverflow.com/questions/55956222/how-to-read-data-from-avro-file-using-c-interface]
> .Here is the sample code that I am writing:
> {code:java}
> std::unique_ptr<avro::InputStream> avroInputStream =
> avro::istreamInputStream(retrievedFile); // `retrievedFile` is a
> basic_iostream from AWS S3
> // get the schema file
> std::stringstream schemaInput(schemaName);
> avro::ValidSchema validSchema;
> avro::compileJsonSchema(schemaInput, validSchema);
> // read the data input stream with the given valid schema
> avro::DataFileReader<avro::GenericDatum> fileReader(move(avroInputStream));
> avro::GenericDatum datum(fileReader.dataSchema());
> std::vector<avro::GenericRecord> recordArray;
> while (fileReader.read(datum)) {
> if (datum.type() == avro::AVRO_RECORD) {
> std::cout << "[Check 1]" << std::endl;
> const avro::GenericRecord record =
> datum.value<avro::GenericRecord>(); // result in segmentation fault
> std::cout << "[Check 2]" << std::endl;
> recordArray.push_back(record);
> }
> }
> // processing the recordArray further
> ...{code}
> Interestingly, if instead I used the struct generated by the JSON schema to
> parse the avro file instead of treating everything generic, it worked just
> fine (I'm following the example here:
> [https://avro.apache.org/docs/current/api/cpp/html/index.html#UsingAvroDataFiles]
> )
--
This message was sent by Atlassian Jira
(v8.3.4#803005)