[
https://issues.apache.org/jira/browse/AVRO-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970112#comment-15970112
]
ASF GitHub Bot commented on AVRO-1167:
--------------------------------------
GitHub user johnsgill3 opened a pull request:
https://github.com/apache/avro/pull/217
AVRO-1167, AVRO-766: Fix c AVRO_LINK memory leaks
Fixes AVRO-766
Fixes AVRO-1167
- Correct reference accounting during `avro_schema_copy`
- Save Named Schemas (Records, Enum, Fixed) during copy into hash
- Use new Avro named schema in link target from saved hash
- Adds tests cases from patches in AVRO-766 & ARRo-1167
Running the test cases through valgrind does not generate any memory leak
errors.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/johnsgill3/avro fix-c-link-memory-leaks
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/avro/pull/217.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #217
----
commit 712c47cc1b8f1f5f5395c6ad72857ea5b73c3711
Author: John Gill <[email protected]>
Date: 2017-04-15T17:01:38Z
AVRO-1167: Enhance avro_schema_copy for AVRO_LINK
- Add hash of named schemas found during copy
- Find saved named schema for copy of AVRO_LINK
commit f587ade67f0856ebdfa1106089e7963bf60bc495
Author: John Gill <[email protected]>
Date: 2017-04-15T20:02:02Z
AVRO-766: Correct memory leaks in AVRO_LINK copy
- Adds test cases for AVRO-766 & AVRO-1167
- Corrects reference counting for avro_schema_copy
----
> Avro-C: avro_schema_copy() does not copy AVRO_LINKs properly.
> -------------------------------------------------------------
>
> Key: AVRO-1167
> URL: https://issues.apache.org/jira/browse/AVRO-1167
> Project: Avro
> Issue Type: Bug
> Components: c
> Affects Versions: 1.7.1
> Environment: Ubuntu Linux 11.10
> Reporter: Vivek Nadkarni
> Attachments: AVRO-1167-TEST.patch
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> When avro_schema_copy() encounters an AVRO_LINK from an old_schema to a
> new_schema, it sets the target of the new_link to the target of the old_link
> in the old_schema. Thus, the AVRO_LINK in the new_schema points to an element
> in the old_schema.
> While this is currently safe, since the reference count of the target in the
> old_schema is incremented, we are not really making a "copy" of the schema.
> There is a "TODO" in the code, which says that we should make a
> avro_schema_copy() of the target in old_schema instead of linking directly to
> it. However, this solution of making a copy would result in a few problems:
> 1. Avro schemas are intended to be self-contained. That implies that
> AVRO_LINKs are intended to be internal links inside a self-contained schema.
> The code introduces unnecessary (and potentially disallowed) external
> dependencies in an Avro schema.
> 2. The purpose of copying a schema that we want to decouple the old_schema
> from the new_schema. The two copies may have different owners, we may want to
> deallocate old schema etc.
> 3. If the schema is recursive, then the code would enter an infinite
> recursion loop.
> It appears to me that the "correct" solution would be to replicate the entire
> structure of the current schema, including the internal links. This means
> that if old_link_A points to old_target_B, then new_link_A should point to
> new_target_B in the new schema. Note that there should only be one copy of
> new_target_B in the new schema, even if there are multiple links pointing to
> new_target_B - i.e. we should not make a new copy for each link.
> In order to implement this proper copying of links, we would need to keep a
> lookup table of pairs of old and new schemas as they are being created, as
> well as a list of all the AVRO_LINKs that are copied. Then as a post-copy
> step, we would go and fix up all the AVRO_LINKs to point to the appropriate
> targets. This is the way the schema is constructed in the first place in
> avro_schema_from_json().
> An inefficient way to obtain the correct result from avro_schema_copy() would
> be to perform an avro_schema_to_json() followed by an avro_schema_from_json().
> Note: I have not implemented a fix for this issue, but I am documenting this
> issue in AVRO-JIRA because this issue needs to be resolved before AVRO-766
> can be fixed.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)