On 08/11/2017 04:04 AM, Christoph Hellwig wrote:
> On Mon, Aug 07, 2017 at 09:39:59AM -0700, Dave Jiang wrote:
>> +static int queue_mode = PMEM_Q_MQ;
> 
> So this changes the default for everyone.  How about those systems
> without dma engine?

The code requests a DMA engine during initialization to see if it's
possible and would disable DMA implementation if there are no DMA
engines found and revert back to the BIO path. With a separate driver,
it would just use blk-mq w/o DMA.

> 
>>  module_param(queue_mode, int, 0444);
>> -MODULE_PARM_DESC(queue_mode, "Pmem Queue Mode (0=BIO, 1=BLK-MQ)");
>> +MODULE_PARM_DESC(queue_mode, "Pmem Queue Mode (0=BIO, 1=BLK-MQ & DMA)");
>> +
>> +static int queue_depth = 128;
>> +module_param(queue_depth, int, 0444);
>> +MODULE_PARM_DESC(queue_depth, "I/O Queue Depth for multi queue mode");
> 
> This changes the default from the previous patch, why not introduce
> this parameter and default there.  And an explanation of the magic
> value would still be nice.

The value has more relation with ioatdma descriptor numbers per channel.
Max numbers of descriptors per channel is 64k. A queue depth of 128 with
possible size of scatterlist of 256 entries gives 32k. It just seemed a
reasonable number if the DMA channels may be shared with other users as
well. Of course on other platforms, this number will have to be adjusted
for the platform DMA engine capability.

> 
>> +/* typically maps to number of DMA channels/devices per socket */
>> +static int q_per_node = 8;
>> +module_param(q_per_node, int, 0444);
>> +MODULE_PARM_DESC(q_per_node, "Hardware queues per node\n");
> 
> How can a fixed constant "typically map" to a hardware dependent
> resource?

My concern is because this could be dependent on whose platform the
drive is running on. ioatdma provides 8 channel per socket. Some other
DMA engine may be different and needs to be altered.

> 
>> +    if (res) {
> 
> unlikely?

ok

> 
>> +            enum dmaengine_tx_result dma_err = res->result;
>> +
>> +            switch (dma_err) {
> 
> do you really need the local variable for a single check of it?

ok

> 
>> +            case DMA_TRANS_READ_FAILED:
>> +            case DMA_TRANS_WRITE_FAILED:
>> +            case DMA_TRANS_ABORTED:
>> +                    dev_dbg(dev, "bio failed\n");
>> +                    rc = -ENXIO;
>> +                    break;
>> +            case DMA_TRANS_NOERROR:
>> +            default:
>> +                    break;
>> +            }
>> +    }
>> +
>> +    if (req->cmd_flags & REQ_FUA)
>> +            nvdimm_flush(nd_region);
>> +
>> +    blk_mq_end_request(cmd->rq, rc);
> 
> blk_mq_end_request takes a blk_status_t.  Note that sparse would
> have caught that, so please run your code through sparse.

ok

> 
>> +    if (cmd->sg_nents > 128) {
> 
> WARN_ON_ONCE as this should not happen.

ok

> 
>> +    if (queue_mode == PMEM_Q_MQ) {
>>              pmem->tag_set.ops = &pmem_mq_ops;
>> -            pmem->tag_set.nr_hw_queues = nr_online_nodes;
>> -            pmem->tag_set.queue_depth = 64;
>> +            pmem->tag_set.nr_hw_queues = nr_online_nodes * q_per_node;
>> +            pmem->tag_set.queue_depth = queue_depth;
> 
> please use present nodes - we don't remap on cpu soft online/offline.

ok

> 
>>              pmem->tag_set.numa_node = dev_to_node(dev);
>> -            pmem->tag_set.cmd_size = sizeof(struct pmem_cmd);
>> +            pmem->tag_set.cmd_size = sizeof(struct pmem_cmd) +
>> +                    sizeof(struct scatterlist) * 128;
> 
> s/128/queue_depth/
Good catch. But it shouldn't be queue depth right? It would be whatever
we define the max_segments to be?

> 
>>              pmem->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
>>              pmem->tag_set.driver_data = pmem;
>>  
>> @@ -466,7 +649,11 @@ static int pmem_attach_disk(struct device *dev,
>>      blk_queue_write_cache(pmem->q, wbc, fua);
>>      blk_queue_physical_block_size(pmem->q, PAGE_SIZE);
>>      blk_queue_logical_block_size(pmem->q, pmem_sector_size(ndns));
>> -    blk_queue_max_hw_sectors(pmem->q, UINT_MAX);
>> +    if (queue_mode == PMEM_Q_MQ) {
>> +            blk_queue_max_hw_sectors(pmem->q, 0x200000);
>> +            blk_queue_max_segments(pmem->q, 128);
> 
> Where do these magic numbers come from?
>

The 2M is limitation of ioatdma. I really should plumb dmaengine
subsystem to provide the xfer_cap.

I'm assuming the max_segments limits the number of scatterlist entries
to that? I should make that a tuneable...


_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Reply via email to