[GitHub] madlib pull request #342: Minibatch Preprocessor for Deep learning

reductionista Fri, 21 Dec 2018 18:40:47 -0800

Github user reductionista commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/342#discussion_r243722566
  
    --- Diff: 
src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in ---
    @@ -580,3 +679,82 @@ class MiniBatchDocumentation:
                 for help.
             """.format(**locals())
     # ---------------------------------------------------------------------
    +    @staticmethod
    +    def minibatch_preprocessor_dl_help(schema_madlib, message):
    +        method = "minibatch_preprocessor_dl"
    +        summary = """
    +        ----------------------------------------------------------------
    +                            SUMMARY
    +        ----------------------------------------------------------------
    +        For Deep Learning based techniques such as Convolutional Neural 
Nets,
    +        the input data is mostly images. These images can be represented 
as an
    +        array of numbers where all elements are between 0 and 255 in value.
    +        It is standard practice to divide each of these numbers by 255.0 to
    +        normalize the image data. minibatch_preprocessor() is for general
    +        use-cases, but for deep learning based use-cases we provide
    +        minibatch_preprocessor_dl() that is light-weight and is
    +        specific to image datasets.
    +
    +        The normalizing constant is parameterized, and can be specified 
based
    +        on the kind of image data used.
    +
    +        For more details on function usage:
    +        SELECT {schema_madlib}.{method}('usage')
    +        """.format(**locals())
    +
    +        usage = """
    +        
---------------------------------------------------------------------------
    +                                        USAGE
    +        
---------------------------------------------------------------------------
    +        SELECT {schema_madlib}.{method}(
    +            source_table,          -- TEXT. Name of the table containing 
input
    +                                      data.  Can also be a view
    +            output_table,          -- TEXT. Name of the output table for
    +                                      mini-batching
    +            dependent_varname,     -- TEXT. Name of the dependent variable 
column
    +            independent_varname,   -- TEXT. Name of the independent 
variable
    +                                      column
    +            buffer_size            -- INTEGER. Default computed 
automatically.
    +                                      Number of source input rows to pack 
into a buffer
    +            normalizing_const      -- DOUBLE PRECISON. Default 255.0. The
    +                                      normalizing constant to use for
    +                                      standardizing arrays in 
independent_varname.
    +        );
    +
    +
    +        
---------------------------------------------------------------------------
    +                                        OUTPUT
    +        
---------------------------------------------------------------------------
    +        The output table produced by MiniBatch Preprocessor contains the
    +        following columns:
    +
    +        buffer_id               -- INTEGER.  Unique id for packed table.
    +        dependent_varname       -- FLOAT8[]. Packed array of dependent 
variables.
    +        independent_varname     -- FLOAT8[]. Packed array of independent
    +                                   variables.
    +
    --- End diff --
    
    Assuming my previous suggestion is taken, I would write {dependent_varname} 
and {independent_varname} here to distinguish from the columns in the summary 
table, which are literal strings rather than references to the parameters the 
user passes in.

---

[GitHub] madlib pull request #342: Minibatch Preprocessor for Deep learning

Reply via email to