> are the partition keys values (say country=us or country=uk) need to be 
> defined before-hand or unbounded?Yes the partition values themselves are 
> unbounded.
>  does the storage location need to have the partition key in themIn most 
> cases there are time partitions, besides the time partition, there can be 
> other partition, which are declared in the partition section. So the 
> partitions ought to be in the path as a variable. It can be skipped if no 
> consumer has interest in filtering and selecting a section of the data 
> through the dataIn(input, partitionSpec) function.
> if the partition keys are not in the FileSystem path, how does Falcon 
> identify a feed partition physical location
If partition keys aren't specified, then Falcon can't use it either in the file 
system version of the input. Partitions are only used in two scenarios by 
Falcon. 1) When data is partitioned in multiple clusters, they can be merged 
into a single location using replication (single target, multiple source). For 
this to work, each source should own a partition exclusively. 2) Data can be 
selectively consumed by filtering specific partition through the dataIn() EL 
expression
RegardsSrikanth Sundarrajan

> From: [email protected]
> Date: Wed, 23 Jul 2014 17:16:34 -0700
> Subject: Partitions in Feed definition
> To: [email protected]
> 
> Hey all,
> 
> Few questions about Partitions:
> 
> Partitions in the FEED xml like below:
> 
>     <partitions>
>         <partition name="colo"/>
>         <partition name="country"/>
>     </partitions>
> 
> 
>    1. I see these are partition keys; are the partition keys values
> (say country=us or country=uk) need to be defined before-hand or
> unbounded?
>    2. does the storage location need to have the partition key in
> them? Like below (see the colo and country partition keys)
> 
>    <location path="/data/${colo}/${country}/${YEAR}/${MONTH}/${DAY}"
> type="data"/>
> 
>    3.
> 
>    if the partition keys are not in the FileSystem path, how does
> Falcon identify a feed partition physical location (actually,
> how/where is it used)? I understand if it were HCAT, the Feed
> definition has the partition key-values.
> 
>    4.
> 
>    Are these partition keys and values validated against the
> FileSystem or HCAT locations?
> 
> 
> 
> Partition attribute in the Cluster reference:
> 
> Using the example from the documentation page
> <http://falcon.incubator.apache.org/docs/FalconArchitecture.html#Replication>
> 
> 
>    1. What does it mean to specify partitions in a source cluster ?
>    2. vs target cluster? (does it act like a filter to pull only a
> subset of data from source? -- if so how does Falcon know to read the
> subset in Filesystem feed?)
>    3. What data is in sourceCluster1, sourceCluster2 and what location?
>    4. Which path does the replicated data end up in the backupCluster 
> (target)?
> 
> 
> A few questions.  Hopefully it's something straightforward about
> partitions that I have missed.
> 
> 
> Thanks for your answers,John
                                          

Reply via email to