[jira] [Commented] (GEODE-10) HDFS Integration

Dan Smith (JIRA) Wed, 27 Apr 2016 13:55:46 -0700

    [ 
https://issues.apache.org/jira/browse/GEODE-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260925#comment-15260925
 ]


Dan Smith commented on GEODE-10:
--------------------------------

This code has all been removed from develop. I created a feature branch - 
feature/GEODE-10, which has all of the code added back on for anyone who wants 
to pick this up and get this code cleaned up.

Before this can be integrated back with develop:
 * The tests should pass consistently
 * This code should be refactored into a separate subproject, rather than just 
a whole bunch of changes to geode-core
 * The overall spec should be discussed and approved of by the dev list.

I believe this code was working, far as writing changes to HDFS and reading 
from HDFS on gets and iterations. I'm not sure the whole concept of 
"operational data" mentioned in the attached PDF was ever completely worked 
out, or if any InputFormat to read the data from HDFS was ever finished.

> HDFS Integration
> ----------------
>
>                 Key: GEODE-10
>                 URL: https://issues.apache.org/jira/browse/GEODE-10
>             Project: Geode
>          Issue Type: New Feature
>          Components: hdfs
>            Reporter: Dan Smith
>            Assignee: Ashvin
>         Attachments: GEODE-HDFSPersistence-Draft-060715-2109-21516.pdf
>
>
> Ability to persist data on HDFS had been under development for GemFire. It 
> was part of the latest code drop, GEODE-8. As part of this feature we are 
> proposing some changes to the HdfsStore management API (see attached doc for 
> details). 
> # The current API has nested configuration for compaction and async queue. 
> This nested structure forces user to execute multiple steps to manage a 
> store. It also does not seem to be consistent with other management APIs
> # Some member names in current API are confusing
> HDFS Integration: Geode as a transactional layer that microbatches data out 
> to Hadoop. This capability makes Geode a NoSQL store that can sit on top of 
> Hadoop and parallelize the process of moving data from the in memory tier 
> into Hadoop, making it very useful for capturing and processing fast data 
> while making it available for Hadoop jobs relatively quickly. The key 
> requirements being met here are
> # Ingest data into HDFS parallely
> # Cache bloom filters and allow fast lookups of individual elements
> # Have programmable policies for deciding what stays in memory
> # Roll files in HDFS
> # Index data that is in memory
> # Have expiration policies that allows the transactional set to decay out 
> older data
> # Solution needs to support replicated and partitioned regions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (GEODE-10) HDFS Integration

Reply via email to