Erik - is there a current locale for approved/recommended third party additions? The spark-packages has been stale for years it seems.
Am Fr., 19. Okt. 2018 um 07:06 Uhr schrieb Erik Erlandson < eerla...@redhat.com>: > Hi Matt! > > There are a couple ways to do this. If you want to submit it for inclusion > in Spark, you should start by filing a JIRA for it, and then a pull > request. Another possibility is to publish it as your own 3rd party > library, which I have done for aggregators before. > > > On Wed, Oct 17, 2018 at 4:54 PM Matt Saunders <m...@saunders.net> wrote: > >> I built an Aggregator that computes PCA on grouped datasets. I wanted to >> use the PCA functions provided by MLlib, but they only work on a full >> dataset, and I needed to do it on a grouped dataset (like a >> RelationalGroupedDataset). >> >> So I built a little Aggregator that can do that, here’s an example of how >> it’s called: >> >> val pcaAggregation = new PCAAggregator(vectorColumnName).toColumn >> >> // For each grouping, compute a PCA matrix/vector >> val pcaModels = inputData >> .groupBy(keys:_*) >> .agg(pcaAggregation.as(pcaOutput)) >> >> I used the same algorithms under the hood as >> RowMatrix.computePrincipalComponentsAndExplainedVariance, though this works >> directly on Datasets without converting to RDD first. >> >> I’ve seen others who wanted this ability (for example on Stack Overflow) >> so I’d like to contribute it if it would be a benefit to the larger >> community. >> >> So.. is this something worth contributing to MLlib? And if so, what are >> the next steps to start the process? >> >> thanks! >> >