Dear Jan,

Thank you for your response. I have background in SQL (MS, Sybase, MySQL, 
Postgre), so luckily I'm not a beginner in query writing, however, I have 
absolutely no background in how databases work on background (other than 
hash-table).

The idea is not really SIMD here, but a specially designed hardware to perform 
database functions much faster than the conventional DISK<->RAM<->CACHE<->CPU 
model. What I'm struggling to figure is that the latency caused by a 
enterprise-grade SQL script (not very correct and advanced code, but not very 
low-quality either), is Memory-Related or Arithmetic related or both? 

Here is what can be made for example:

1) A card with 16 DDR3 Chips on board and a Central FPGA, allowing very fast 
access to data, and like 100 small CPUs integrated in the FPGA to process a 
query.
   (a query that was written to take advantage of these features however)

2) A card with XDR Memory chips and like 32GB of Flash Chips to achieve very 
high-speed storage, retrieval (SSD<->RAM) and an FPGA that has many cores 
inside. A query is executed by FPGA pulling large chunk of data from Flashes, 
putting them in XDR RAM chips and processes them in a parallel manner

3) A card with XDR Memory chips and an FPGA on board, connected to a PCI 
Express 8x which receives all the data it needs from the host computer, and 
does the parallel processing inside itself, returning it back to the host.

4) A card with an FPGA inside, having 100 small CPUs and PCI Express to take 
commands from host computer and process in a parallel manner

5) A card with an FPGA, not having small processors inside, but special 
circuitry that does either lookup, sorting, etc in a very fast manner


These are a few examples that I gave. The difference among them is the notion 
of having RAM, having a cold-storage, having an FPGA with many CPUs inside or 
an FPGA with special dedicated circuitry that performs special functions in 
parallel that is difficult for a CPU to handle (imaging adding 300 numbers to 
another 300 numbers in a single cycle, where a CPU with SIMD would take much 
longer, or maybe a string processor that processes many string columns with 
PATINDEX, etc in parallel where a CPU would be slow as it can't handle them in 
parallel).

So what I'm researching is whether beneath the SQL pyramid, there is a memory 
intensive operation going on or logic-intensive. What to look for and what to 
aim for...


Thanks in advance for your help,
Nasser





On Friday, October 24, 2014 7:49 PM, Jan Lindström <[email protected]> 
wrote:
 


Hi,

This idea of SIMD (single instruction multiple data) processing is not totally 
new one, similarly the idea to perform SQL-operations inside GPU or GPGA is not 
new. In traditional relational databases problematic is the fact that e.g. in 
your example TableX contains several columns, picking columns A and B from 
pages that reside first on disk, then on main-memory and finally on L1-L3 cache 
is not cheap, and then they are not on continuous memory. This is because page 
in cache would contain values for other columns that we not even need. In 
columnar database architecture this would be a lot easier, you just feed column 
containing values for A and B directly to SIMD operation and every page in main 
memory would contain a lot more values to process compared to traditional 
relational database where page would contain also values for columns that we 
really do not even need to process the result set of query. Anyway, I find the 
proposal interesting and
 challenging.

R: Jan Lindström



On Thu, Oct 23, 2014 at 3:39 PM, Nasser Ghoseiri <[email protected]> wrote:

Dear Serg,
>
>
>Following our little chat in IRC, I'm writing this email to explain in more 
>detail what the idea is. My name is Nasser GHOSEIR (Founder, CTO of Butterfly 
>labs), which I must note that this project is not from Butterfly Labs, 
>but will be a new company in Europe.
>
>
>Our idea is to find a way to accelerate SQL query processing by either:
>
>
>1) Creating an FPGA solution (Which will later evolve into ASIC), that has 
>like 400 processors in itself, allow distribute calculation of some kind
>2) Creating an FPGA/ASIC solution that performs large number of unrelated 
>tasks (such as addition, multiplication, etc) in parallel
>3) Creating an FPGA/ASIC solution that allows very high-speed access to data 
>with some PRE-PROCESSING involved to accelerate the calculation
>4) Creating a very high-speed storage solution, but no pre-processing.
>
>
>An example to give is for 200,000 rows, imagine: "SELECT A+B, C FROM TableX 
>WHERE C > 0". If there are 200,000 records, the processor has to perform 
>200,000 additions (A+B).
>A CPU will handle these additions one-by-one (or maybe few-by-few if SSE, etc 
>is used). However, an FPGA/ASIC solution can perform 1000 additions in a 
>single-cycle. This results in acceleration.
>
>
>Now, to what extend this can be effective, or what other solutions (maybe 
>string processing?) can be implemented to accelerate the SQL processing is a 
>question to me right now. But we do have experience in
>making extremely fast processors (our Butterfly Labs Monarch chip performs 400 
>billion double-SHA256 hashes in 1 second, around 20,000 times faster than the 
>best intel XEON processor can do only twenty 
>million hashes per second).
>
>
>The project is in brain-storming phase, and we are aware that this solution 
>will be useful to large enterprises, or companies that have to deal with 
>200,000 rows in a single "SELECT" query. The idea is to integrate
>some features in hardware, and then re-write a portion of MariaDB to take 
>advantage of the new resources. Also, it is possible that users/companies will 
>need to re-write their query to make it compliant.
>
>
>You can reach my by mail at [email protected] or by phone at +33 6 72 17 26 19 
>(France).
>
>
>
>
>
>
>Best Regards,
>Nasser GHOSEIRI
>
>
>
>
>
>
>
>
>_______________________________________________
>Mailing list: https://launchpad.net/~maria-developers
>Post to     : [email protected]
>Unsubscribe : https://launchpad.net/~maria-developers
>More help   : https://help.launchpad.net/ListHelp
>
>
_______________________________________________
Mailing list: https://launchpad.net/~maria-developers
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp

Reply via email to