[ 
https://issues.apache.org/jira/browse/DERBY-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035611#comment-13035611
 ] 

Rick Hillegas commented on DERBY-5235:
--------------------------------------

Why have longer varchars?

It's true that you might want to sort or index these values. But you might 
merely want to ask whether longStringA = longStringB. Derby won't let you do 
that with LONG VARCHAR or CLOB. Whatever arbitrary limit you put on the length 
of your Strings, someone will need a String that is just a little longer. 
Arbitrary limits are the hallmark of bad language design. On this point, a 
broad consensus has evolved, as you can see if you google up the keywords 
"arbitrary limits programming".

What does data look like in the real world?

I can only offer anecdotal evidence here. In writing Derby-powered apps for 
myself, I am often frustrated by having to choose between a String type which 
is comparable and a String type which is big enough to hold the occasional 
large value. Some data distributions are homogeneous. But others fit a normal 
distribution, with a big lump of similar values in the middle and bizarre 
outliers in the tails. I think it is not uncommon to need a String column which 
usually holds small values but which can balloon up in edge or error cases.

What direction should we go in?

SQL has too many String types. We are stuck with the annoying semantic 
differences between CHAR and VARCHAR. Luckily for us, the Unicode basis of Java 
Strings means that we don't have to worry about NCHAR, NVARCHAR, LONG NVARCHAR, 
and NCLOB too.

I would like to see the differences among our String types vanish, not harden. 
I don't think customers want to have to trade off compactness against 
performance against usability. Ideally, there would only be one String type and 
the database would be smart enough to store and process it efficiently.

For compliance and compatibility reasons, we'll never eliminate the redundancy 
in our String types. But CLOB, LONG VARCHAR, and VARCHAR( 2147483647 ) should 
be synonyms. The customer should get the same great usability and performance 
regardless of the String type they pick.

By the way, I like Mike's suggestions about the possibility of indexing long 
Strings. Hopefully someone will want to invest in that improvement some day.


> Remove the artificial limit on the length of VARCHAR values, allowing them to 
> be java.lang.Integer.MAX_VALUE long
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-5235
>                 URL: https://issues.apache.org/jira/browse/DERBY-5235
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.9.0.0
>            Reporter: Rick Hillegas
>
> The original Cloudscape limit for the length of VARCHAR values was 
> java.lang.Integer.MAX_VALUE. That is the limit in Cloudscape 5.1. Nothing in 
> Derby should break if we restore the original limit. The current limit is an 
> artificial bound introduced to make Derby agree with DB2. 32672 is the upper 
> bound on the length of a DB2 VARCHAR: 
> http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/r0001029.htm

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to