[ 
https://issues.apache.org/jira/browse/JENA-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119377#comment-13119377
 ] 

Stephen Allen commented on JENA-126:
------------------------------------

Paulo, that's a good idea.  I've been stuck thinking about the problem in terms 
of a full SPARQL server with lots of concurrent requests.  I think your idea 
could work well when you only have a single databag like in tdbloader.  I would 
be interested to see how it scales up as the number of bags increases.
                
> Change temporary table threshold policy from count to memory size
> -----------------------------------------------------------------
>
>                 Key: JENA-126
>                 URL: https://issues.apache.org/jira/browse/JENA-126
>             Project: Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Stephen Allen
>
> The "workCount" setting for temporary table sizes is not a good configuration 
> option.  Binding sizes could potentially vary from as little as 32 bytes (8 
> byte ref to the binding + 8 byte ref to a variable + 8 byte nodeID + 8 byte 
> object overhead), to some bindings with multi-megabyte strings.  Asking the 
> user to know which one it is likely to be, and then how that count translates 
> into memory usage (the real resource we are attempting to control) is already 
> way too much IMO.
> OK, so what the user wants is a way to specify the amount of memory that can 
> be used by each query operator for temporary tables [1][2][3].  Hmm, wait, no 
> what he maybe wants is a way to specify a the total memory used for temporary 
> tables per query?  No, maybe he wants to specify it for the whole query 
> engine.
> But that last paragraph is not accurate.  What he *really* wants is a system 
> that answers all of his queries for whatever data he has as fast as possible. 
>  He doesn't want to have to configure any parameters.  Unfortunately, this is 
> a really hard dynamic optimization problem so we foist it off on the user, 
> hoping he'll be able to come up with some value.
> We need to decide on what we want to use as a config parameter.  I believe it 
> should be a "workMem" or "tmpTableSize" setting that specifies the max memory 
> usage of a temporary table before it is converted into an on-disk table.
> [1] This is what most DB systems provide, specifically PostgreSQL and MySQL 
> both have per operator temporary table sizes.  PostgreSQL calls the setting 
> "work_mem" and MySQL calls it "tmp_table_size"
> [2] http://www.postgresql.org/docs/8.3/static/runtime-config-resource.html
> [3] http://dev.mysql.com/doc/refman/5.0/en/internal-temporary-tables.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to