"Yasuo Ohgaki" wrote in message news:caga2bxa4uvkl-zslab2bf05l4q_oduixszvvyzu9nddksvt...@mail.gmail.com...

Hi Tony,

<snip>

As a person who has been developing database applications for several
decades and with PHP since 2003 I'd like to chip in with my 2 cent's worth.
Firstly I agree with Dan's statement:

This type of library should be done in PHP, not in C.

Secondly, there is absolutely no way that you can construct a standard
library which can execute all the possible validation rules that may exist.
In my not inconsiderable experience there are two types of validation:
1) Primary validation, where each field is validated against the column
specifications in the database to ensure that the value can be written to
that column without causing an error. For example this checks that a number
is a number, a data is a date, a required field is not null, etc.
2) Secondary validation, where additional validation/business rules are
applied such as comparing the values from several fields. For example, to
check that START_DATE is not later than END_DATE.

Primary validation is easy to automate. I have a separate class for each
database table, and each class contains an array of field specifications.
This is never written by hand as it is produced by my Data Dictionary which imports data from the database schema then exports that data in the form of
table class files and table structure files. When data is sent to a table
class for inserting or updating in the database I have written a standard
validation procedure which takes two arrays - an array of field=value pairs
and a array of field=specifications - and then checks that each field
conforms to its specifications. This validation procedure is built into the
framework and executed automatically before any data is written to the
database, so requires absolutely no intervention by the developer.

Secondary validation cannot be automated, so it requires additional code
to be inserted into the relevant validation method. There are several of
these which are defined in my abstract table class and which are executed
automatically at a predetermined point in the processing cycle. These
methods are defined in the abstract class but are empty. If specific code
is required then the empty class can be copied from the abstract class to
the concrete class where it can be filled with the necessary code.

If there are any developers out there who are still writing code to
perform primary validation then you may learn something from my
implementation.

If there are any developers out there who think that secondary validation
can be automated I can only say "dream on".


Please let me explain rationale behind input validation at outermost trust
boundary. There are 3 reasons why I would like propose the validation. All of 3
requires validation at outermost trust boundary.

1. Security reasons
Input validation should be done with Fail Fast manner.

The language should only provide the basic features which allow values to be validated. That is what the filter functions are for. All that is necessary is for user input to be validated before any attempt is made to write it to the database.

2. Design by Contract (DbC or Contract Programming)
In order DbC to work, validations at outermost boundary is mandatory.
With DbC, all inputs are validated inside functions/methods to make sure
correct program executions.

Irrelevant. DbC is a methodology which PHP was never designed to support, and I see no reason why it should. If you really want DbC then switch to a language which supports it, or use a third-party extension which provides supports.

However, almost all checks (in fact, all checks done by DbC support)
are disabled for production. How to make sure program works correctly?
All inputs data must be validated at outermost boundary when DbC is
disabled. Otherwise, DbC may not work. (DbC is supposed to achieve
both secure and efficient code execution.)

3. Native PHP Types
Although my validate module is designed not to do unwanted conversions,
but it converts basic types to PHP native types by default. (This can be
disabled) With this conversion at outermost trust boundary, native PHP type works
fluently.

What is the difference between a basic type and a PHP native type?

Although, my current primary goal is 1, but 2 and 3 is important as well.

2 is important especially. Providing DbC without proper basic validation
feature does not make much sense, and could be disaster.
Users may validate input with their own validation library, but my guess
is pessimistic. User wouldn't do proper validation due to too loose
validation libraries and rules. There are too few validators that do
true validations that meet requirements for 1 and 2. IMHO, even if
there are good enough validators, PHP should provide usable validator
for core features. (DbC is not implemented, though)

It does, in the form of the filter functions.

I hope you understand my intentions and accept the feature in core.
Feature for core should be in core. IMO.

The filter functions are already in core. How these functions are used is down to userland code.

1) Primary validation, where each field is validated against the column
specifications in the database to ensure that the value can be written to
that column without causing an error. For example this checks that a number
is a number, a data is a date, a required field is not null, etc.
2) Secondary validation, where additional validation/business rules are
applied such as comparing the values from several fields. For example, to
check that START_DATE is not later than END_DATE.

Validation rules for input, logic and database may differ.
Suppose you validate "user comment" data.
Input:        0 -    10240 bytes - Input might have to allow larger size
than logic. i.e. lacks client side validation.
Logic:      10 -     1024 bytes - Logic may require smaller range as
correct data.
Database: 0 - 102400 bytes - Database may allow much larger size for future
extension.

Under ideal situation, all of these may be the same but they are not in
real world.

I wouldn't aim to consolidate all validations, but I would like to avoid
unnecessary
incompatibilities so that different validations can cooperate if it is
possible.

What exactly are these "unnecessary incompatibilities"?

I'm very interested in PDO level validation because SQLite3 could be very
dangerous.

Anything which is misused can be dangerous. It is almost impossible to provide a function and prevent stupid people from misusing it.

(i.e. Type affinity allows store strings in int/float/date/etc) It may be
useful if PDO
can simply use "validate" module's rule or API.

BTW, Input validation should only validate format(used char, length, range,
encoding)
if we follow single responsibility principle. Logical correctness is upto
logic. i.e. Model in
MVC.

Anyway, goal is providing usable basic validator for core features and
security.

If you wish to improve the filter functions ten go ahead. Anything more than this would be a step too far.

Required trade offs may be allowed.

Do not waste time by trying to add into core what should be done in userland code.

Regards,

--
Yasuo Ohgaki
yohg...@ohgaki.net

--
Tony Marston


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to