[ 
https://issues.apache.org/jira/browse/PARQUET-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542270#comment-14542270
 ] 

Julien Le Dem edited comment on PARQUET-275 at 5/13/15 5:25 PM:
----------------------------------------------------------------

Projecting inside a set should not change the number of elements in the set.
When projecting, we just mean that we are not going to access the other fields, 
but that does not change the data content.
It is an optimization for not loading the data that we are not going to access.

In the case of the Map api, you would be able to list entries when part of the 
key is projected but you would not be able to get something by key obviously as 
that requires having the entire key.
same thing for a set, you would not be able to use add or contains but you 
would be able to iterate and access the fields that are not projected out.

But to support this in parquet-thrift we need to be able to use our own 
implementation of Set or Map so the current implementation would have to change.
{noformat}
class ProjectedSet extends Set {
  add() { throw ...}
  contains() { throw ...}
  iterator() { return iterator }
}
class ProjectedMap extends Map {
   put() { throw ...}
  contains() { throw ...}
  get() { throw ...}
  entries() { return entries }
}
{noformat}
as a short term solution, restricting projection or just adding back columns in 
those cases seems reasonable.


was (Author: julienledem):
Projecting inside a set should not change the number of elements in the set.
When projecting, we just mean that we are not going to access the other fields, 
but that does not change the data content.
It is an optimization for not loading the data that we are not going to access.

In the case of the Map api, you would be able to list entries when part of the 
key is projected but you would not be able to get something by key obviously as 
that requires having the entire key.
same thing for a set, you would not be able to use add or contains but you 
would be able to iterate and access the fields that are not projected out.

But to support this in parquet-thrift we need to be able to use our own 
implementation of Set or Map so the current implementation would have to change.

class ProjectedSet extends Set {
  add() { throw ...}
  contains() { throw ...}
  iterator() { return iterator }
}
class ProjectedMap extends Map {
   put() { throw ...}
  contains() { throw ...}
  get() { throw ...}
  entries() { return entries }
}

as a short term solution, restricting projection or just adding back columns in 
those cases seems reasonable.

> It should not be possible to project away columns of a key in a map in 
> parquet-thrift
> -------------------------------------------------------------------------------------
>
>                 Key: PARQUET-275
>                 URL: https://issues.apache.org/jira/browse/PARQUET-275
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>            Reporter: Alex Levenson
>            Assignee: Alex Levenson
>
> That wouldn't make much sense when re-assembling the map, there could be 
> unexpected key collisions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to