Erik - is there a current locale for approved/recommended third party
additions?  The spark-packages has been stale for years it seems.

Am Fr., 19. Okt. 2018 um 07:06 Uhr schrieb Erik Erlandson <
eerla...@redhat.com>:

> Hi Matt!
>
> There are a couple ways to do this. If you want to submit it for inclusion
> in Spark, you should start by filing a JIRA for it, and then a pull
> request.   Another possibility is to publish it as your own 3rd party
> library, which I have done for aggregators before.
>
>
> On Wed, Oct 17, 2018 at 4:54 PM Matt Saunders <m...@saunders.net> wrote:
>
>> I built an Aggregator that computes PCA on grouped datasets. I wanted to
>> use the PCA functions provided by MLlib, but they only work on a full
>> dataset, and I needed to do it on a grouped dataset (like a
>> RelationalGroupedDataset).
>>
>> So I built a little Aggregator that can do that, here’s an example of how
>> it’s called:
>>
>>     val pcaAggregation = new PCAAggregator(vectorColumnName).toColumn
>>
>>     // For each grouping, compute a PCA matrix/vector
>>     val pcaModels = inputData
>>       .groupBy(keys:_*)
>>       .agg(pcaAggregation.as(pcaOutput))
>>
>> I used the same algorithms under the hood as
>> RowMatrix.computePrincipalComponentsAndExplainedVariance, though this works
>> directly on Datasets without converting to RDD first.
>>
>> I’ve seen others who wanted this ability (for example on Stack Overflow)
>> so I’d like to contribute it if it would be a benefit to the larger
>> community.
>>
>> So.. is this something worth contributing to MLlib? And if so, what are
>> the next steps to start the process?
>>
>> thanks!
>>
>

Reply via email to