[
https://issues.apache.org/jira/browse/GORA-419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502113#comment-14502113
]
ASF GitHub Bot commented on GORA-419:
-------------------------------------
Github user renato2099 commented on the pull request:
https://github.com/apache/gora/pull/23#issuecomment-94309903
Thanks a lot for the explanation @gerhardgossen! And yes this is a problem
we have seen in other data stores as well. I mean managing complex data types
because not all data stores provide the same functionality. For example, in
gora-cassandra depending on your mapping file, you could create subcolumns
inside a super column or even separated columns. Then when updating maps, you
could end up updating a whole column even when a single value was modified
inside an array or map. This behaviour is of course wrong. I guess this is also
happening in accumulo per your test.
I think there is a trade-off here between generating a column for each
specific value of a map/array which leads to a more complex scan operation or
using a single column to store them all which leads to the current behaviour.
In Cassandra, arrays and maps can be now stored natively, so I guess we
will be using them soon instead of adding this extra "mapping" complexity. Do
you know if Accumulo stores complex data types or if it plans to?
> AccumuloStore.put deletes entire row when updating map/array field
> ------------------------------------------------------------------
>
> Key: GORA-419
> URL: https://issues.apache.org/jira/browse/GORA-419
> Project: Apache Gora
> Issue Type: Bug
> Components: gora-accumulo
> Affects Versions: 0.5, 0.6
> Environment: Gora 0.5
> Accumulo 1.5.1
> Zookeeper 3.4.6
> Hadoop 1.2.1
> Reporter: Gerhard Gossen
> Priority: Critical
>
> In {{AccumuloStore.put(k, v)}} fields of type MAP or ARRAY are cleared first
> before they are set to the new value. This is done in the methods
> {{putMap}}/{{putArray}} using a call to {{deleteByQuery(q)}}. The name for
> fields to be deleted is taken from the current column. However,
> {{deleteByQuery}} tries to translate the field names of the query to column
> names again, which fails with a log message like
> {code}
> 2015-04-13 13:43:35.084 ERROR 16733 --- [ool-46-thread-1]
> o.a.gora.accumulo.store.AccumuloStore : Mapping not found for field: ol
> 2015-04-13 13:43:35.104 ERROR 16733 --- [ool-46-thread-1]
> o.a.gora.accumulo.store.AccumuloStore : Mapping not found for field: mk
> 2015-04-13 13:43:35.115 ERROR 16733 --- [ool-46-thread-1]
> o.a.gora.accumulo.store.AccumuloStore : Mapping not found for field: mtdt
> {code}
> As a result, the query is not restricted to any field and the *entire row is
> deleted*.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)