[
https://issues.apache.org/jira/browse/MADLIB-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank McQuillan updated MADLIB-1342:
------------------------------------
Summary: Mini-batch preprocessor for images - performance issue (was:
Mini-batch preprocessor for images performance)
> Mini-batch preprocessor for images - performance issue
> ------------------------------------------------------
>
> Key: MADLIB-1342
> URL: https://issues.apache.org/jira/browse/MADLIB-1342
> Project: Apache MADlib
> Issue Type: Improvement
> Components: Module: Utilities
> Reporter: Frank McQuillan
> Priority: Major
> Fix For: v1.17
>
>
> Follow on from https://issues.apache.org/jira/browse/MADLIB-1334
> Improve performance of mini-batch preprocessor for images. May involve
> writing a new matrix aggregation function to support multi-dimensional arrays.
> I have a 2 segment GP5 cluster set up:
> - preprocessing 50k training rows from CIFAR-10 fits into 3 buffers and takes
> ~1 hour (buffer size of 24415 is reported in the summary file)
> - preprocessing 10k training rows from CIFAR-10 fits into 1 buffer and takes
> ~2-3 minutes
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)